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1. INTRODUCTION 

b , 

Cu ^ Decision procedures for satisfiability in theories of data types, such as arrays, lists 

and records, are at the core of many state-of-the-art verification tools (e.g., PVS 
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[Owre et al. 1992], ACL2 [Kaufmann et al. 2000], Simplify [Detlefs et al. 2005], CVC 
[Barrett et al. 2000], ICS [de Moura et al. 2004], CVC Lite [Barrett and Berezin 
2004], Zap [Lahiri and Musuvathi 2005a], MathSAT [Bozzano et al. 2005], Yices 
[Dutertre and de Moura 2006] and Barcelogic [Nieuwenhuis and Oliveras 2007]). 
The design, proof of correctness, and implementation of satisfiability procedures^ 
present several issues, that have brought them to the forefront of research in auto- 
mated reasoning applied to verification. 

First, most verification problems involve more than one theory, so that one needs 
procedures for combinations of theories, such as those pioneered by [Nelson and 
Oppen 1979] and [Shostak 1984]. Combination is complicated: for instance, under- 
standing, formalizing and proving correct Shostak's method required much work 
(e.g., [Cyrluk et al. 1996; Ruefi and Shankar 2001; Barrett et al. 2002b; Ganzinger 
2002; Ranise et al. 2004]). The need for combination of theories means that de- 
cision procedures ought to be easy to modify, extend, integrate into, or at least 
interface, with other decision procedures or more general systems. Second, satisfi- 
ability procedures need to be proved correct and complete: a key part is to show 
that whenever the algorithm reports satisfiable, a model of the input does exist. 
Model-construction arguments for concrete procedures are specialized for those, so 
that each new procedure requires a new proof. Frameworks that offer a higher level 
of abstraction (e.g., [Bachmair et al. 2003; Ganzinger 2002]) often focus on combin- 
ing the quantifier- free theory of equality^, with at most one additional theory, while 
problems from applications, and existing systems, combine many. Third, although 
systems begin to offer some support for adding theories, developers usually have to 
write a large amount of new code for each procedure, with little software reuse and 
high risk of errors. 

If one could use first-order theorem-proving strategies, combination would be- 
come conceptually much simpler, because combining theories would amount to 
giving as input to the strategy the union of the presentations of the theories. No 
ad hoc correctness and completeness proofs would be needed, because a sound and 
complete theorem-proving strategy is a semi-decision procedure for unsatisfiability. 
Existing first-order provers, that embody the results of years of research on data 
structures and algorithms for deduction, could be applied, or at least their code 
could be reused, offering a higher degree of assurance about soundness and com- 
pleteness of the procedure. Furthermore, theorem-proving strategies support proof 
generation and model generation, that are two more desiderata of satisfiability pro- 
cedures (e.g., [Necula and Lee 1998; Stump and Dill 2002; Lahiri and Musuvathi 
2005a]), in a theory-independent way. Indeed, if the input is unsatisfiable, the 
strategy generates a proof with no additional effort. If it is satisfiable, the strategy 
generates a saturated set, that, if finite, may form a basis for model generation 
[Caferraet al. 2004]. 



^In the literature on decision procedures, a "satisfiability procedure" is a decision procedure for 
"satisfiability problems" that are sets of ground literals. 

^Also known as EUF for Equality with Un-interpreted Function symbols. In the literature on 
decision procedures, most authors use "interpreted" and "un-interpreted" to distinguish between 
those symbols whose interpretation is restricted to the models of a given theory and those whose 
interpretation is unrestricted. In the literature on rewriting, it is more traditional to use "definite" 
in place of "interpreted" and "free" in place of "un-intcrprctcd" , as done in [Ganzinger 2002] . 
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The crux is termination: in order to have a decision procedure, one needs to 
prove that a complete theorem-proving strategy is bound to terminate on satisfi- 
abihty problems in the theories of interest. Results of this nature were obtained 
in [Armando et al. 2003]: a refutationally complete rewrite-based inference system, 
named SV (from superposition), was shown to generate finitely many clauses on 
satisfiability problems in the theories of non-empty lists, arrays with or without 
extensionality, encryption, finite sets with extensionality, horaomorphism, and the 
combination of lists and arrays. This work was extended in [Lynch and Morawska 
2002], by using a meta-saturation procedure to add complexity characterizations^. 
Since the inference system SV reduces to ground completion on a set of ground 
equalities and inequalities, it terminates and represents a decision procedure also 
for the quantifier- free theory of equality'*. 

These termination results suggest that, at least in principle, rewrite-based the- 
orem provers might be used "off the shelf" as validity checkers. The common 
expectation, however, is that validity checkers with built-in theories will be much 
faster than theorem provers that take theory presentations as input. In this pa- 
per, we bring evidence that using rewrite-based theorem provers can be a pratical 
option. Our contributions include: 

— New termination results, showing that SV generates finitely many clauses from 
satisfiability problems in the theories of more data structures, records with or 
without extensionality and possibly empty lists, and in two fragments of integer 
arithmetic, the theories of integer offsets and integer offsets modulo; 

— A general modularity theorem, that states sufficient conditions for SV to ter- 
minate on satisfiability problems in a union of theories, if it terminates on the 
satisfiability problems of each theory taken separately; 

— A report on experiments where six sets of parametric synthetic benchmarks were 
given to the rewrite-based theorem prover E [Schulz 2002; 2004], the Cooperating 
Validity Checker CVC [Stump et al. 2002] and its successor CVC Lite [Barrett 
and Berezin 2004] : contrary to expectation, the general first-order prover with the 
theory presentations in input was, overall, comparable with the validity checkers 
with built-in theories, and in some cases even outperformed them. 

Among the termination results, the one for the theory of integer offsets is perhaps 
the most surprising, because the axiomatization is infinite. All the theories consid- 
ered in this paper (i.e., records with or without extensionality, lists, integer offsets, 
integer offsets modulo, arrays with or without extensionality and the quantifier-free 
theory of equality) satisfy the hypotheses of the modularity theorem, so that a fair 
iST'-strategy is a satisfiability procedure for any of their combinations. This shows 
the flexibility of the rewrite-based approach. 

For the experiments, we chose a state-of-the-art theorem prover that implements 
SV, and two systems that combine decision procedures with built-in theories a la 
Nelson-Oppen. At the time of these experiments, CVC and CVC Lite were the only 
state-of-the-art tools implementing a correct and complete procedure for arrays with 



^Meta-saturation as in [Lynch and Morawska 2002] was later corrected in [Lynch and Tran 2007]. 
*That ground completion can be used to compute congruence closure has been known since 
[Lankford 1975]. 
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extensionality^ , namely that of [Stump et al. 2001]. We worked with parametric 
synthetic benchmarks, because they allow one to assess the scalability of systems 
by a sort of experimental asymptotic analysis. Three sets of benchmarks involve the 
theory of arrays with extensionality, one combines the theory of arrays with that 
of integer offsets, one is about queues, and one is about circular queues. In order 
to complete our appraisal, we tested E on sets of literals extracted from real-world 
problems of the UCLID suite [Lahiri and Seshia 2004], and found it solves them 
extremely fast. The selection of problems emphasizes the combination of theories, 
because it is relevant in practice. The synthetic benchmarks on queues feature the 
theories of records, arrays and integer offsets, because a queue can be modelled as a 
record, that unites a partially filled array with two indices that represent head and 
tail. Similarly, the benchmarks on circular queues involve the theories of records, 
arrays and integer offsets modulo, because a circular queue of length fc is a queue 
whose indices take integer values modulo k. The UCLID problems combine the 
theory of integer offsets and the quantifier-free theory of equality. 

1.1 Previous work 

Most termination results for theorem-proving methods are based on identifying 
generic syntactic constraints that the input must satisfy to induce termination 
(e.g., [Fermiiller et al. 2001; Caferra et al. 2004] for two overviews). Our results 
are different, because they apply to specific theories, and in this respect they can 
be considered of a more semantic nature. There are a few other recent works that 
experiment with the application of first-order theorem provers to decidable the- 
ories of data structures. A proof of correctness of a basic Unix-style file system 
implementation was obtained in [Arkoudas et al. 2004] , by having a proof checker 
invoke the SPASS [Weidenbach et al. 1999] and Vampire [Riazanov and Voronkov 
2002] provers for non-inductive reasoning on lists and arrays, on the basis of their 
first-order presentations. The haRVcy system [Deharbe and Ranise 2003] is a ver- 
ification tool based on the rewriting approach that we propound in this paper. It 
integrates the E prover with a SAT solver, based on ordered binary decision di- 
agrams, to implement decision procedures for a few theories. Experiments with 
haRVey offered additional evidence of the effectiveness of the rewriting approach 
[Ranise and Deharbe 2003]. 

The collection of theories considered here is different from that treated in [Ar- 
mando et al. 2003]. Lists a la Shostak (with cons, car, cdr and three axioms) and 
lists a la Nelson-Oppen (with cons, car, cdr, atom and four axioms) were covered in 
[Armando ct al. 2003]. Both axiomatize non-empty lists, since there is no symbol 
such as nil to represent the empty list. Here we consider a different presentation, 
with cons, car, cdr, nil and six axioms, that allows for empty lists. In an approach 
where the axioms are given in input to a theorem prover, a different presenta- 
tion represents a different problem, because termination on satisfiability problems 
including a presentation does not imply termination on satisfiability problems in- 
cluding another presentation. To wit, the finite saturated sets generated by SV 
are different (cf. Lemma 11 in this article with Lemmata 4.1 and 5.1 in [Armando 



^Neither Simplify nor ICS arc complete in this regard: cf. Section 5 in [Detlefs et al. 2005] and 
[Rucfi 2004], respectively. 
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et al. 2003]). Application of a rewrite-based engine to the theories of records, inte- 
ger offsets and integer offsets modulo is studied here for the first time. Although 
the presentation of the theory of records resembles that of arrays, the treatment 
of extensionality is different and the generated saturated sets are very different (cf. 
Lemma 2 and 14 in this article). The only overlap with [Armando et al. 2003] is 
represented by the theory of arrays, for which we redo only the case analysis of 
generated clauses, because that reported in [Armando et al. 2003] is incomplete (cf. 
Lemma 14 in this article with Lemma 7.2 in [Armando et al. 2003]). A short version 
and an extended abstract of this article were presented in [Armando et al. 2005b] 
and [Armando et al. 2005a] , respectively. Very preliminary experiments with a few 
of the synthetic benchmarks were reported in [Armando et al. 2002]. 

2. BACKGROUND 

We employ the basic notions from logic usually assumed in theorem proving. For 
notation, the symbol ~ denotes equality;^ ex: stands for either ~ or 9^; — denotes 
identity; l,r,u,t are terms; v,w,x,y, z are variables; other lower-case Latin letters 
are constant or function symbols based on arity; L is a literal; C and D denote 
clauses, that is, multisets of literals interpreted as disjunctions; ip is a formula; and 
a is used for substitutions. More notation will be introduced as needed. 

A theory is presented by a set of sentences, called its presentation or axiomati- 
zation. Given a presentation T, the theory Th T is the set of all its logical conse- 
quences, or theorems: ThT = {ip \ T \= (p}. Thus, a theory is a deductively-closed 
presentation. An equational theory is a theory presented by a set of universally 
quantified equations. A Horn clause is a clause with at most one positive literal, 
and a definite Horn clause, or non-negative Horn clause, is a clause with at most 
and at least one positive literal. A Horn theory is presented by a set of non-negative 
Horn clauses, and a Horn equational theory is Horn theory where the only predi- 
cate is equality. From a model-theoretic point of view, the term theory refers to the 
family of models of T, or T-models. A model is called trivial if its domain has only 
one element. It is customary to ascribe to a model the cardinality of its domain, 
so that a model is said to be finite or infinite if its domain is. 

By T-atom, T-literal, T-clause, T-sentence and T-formula, we mean an atom, a 
literal, a clause, a sentence and a formula, respectively, on T's signature, omitting 
the T when it is clear from context. Equality is the only predicate, so that all 
T-atoms are T-equations. The problem of T -satisfiability, or, equivalently, satisfi- 
ability modulo T, is the problem of deciding whether a set S of ground T-literals 
is satisfiable in T, or has a T-model. The more general T-decision problem con- 
sists of deciding whether a set S of quantifier-free T-formulae is satisfiable in T. 
In principle, the T-decision problem can be reduced to the T-satisfiability prob- 
lem via reduction of every quantifier-free T-formula to disjunctive normal form. 
However, this is not pratical in general. In this paper, we are concerned only with 
T-satisfiability. T-satisfiability is important, because many problems reduce to T- 
satisfiability: the word problem, or the problem of deciding whether T ]= VS ^ ~ r, 
where ^ ~ r is a T-equation, the uniform word problem, or the problem of deciding 



^The notation ~ is standard for unordered pair, so that I ~ r stands for i ~ r or r ~ i. 

^We discuss existing approaches and future directions for the T-decision problem in Section 7. 
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whether T \=\/x C ^ where C is a Horn T-clause, and the clausal validity problem, 
or the problem of deciding whether T \^\/x C, where C is a T-clause, all reduce, 
through skolemization, to deciding the unsatisfiability of a set of ground literals, 
since all variables are universally quantified. 

The traditional approach to T-satisfiability is that of "little" engines of proofs 
(e.g., [Shankar 2002]), which consists of building each theory into a dedicated in- 
ference engine. Since the theory is built into the engine, the input of the procedure 
consists of S only. The most basic example is that of congruence closure algorithms 
for satisfiability of sets of ground equalities and inequalities (e.g., [Shostak 1978; 
Nelson and Oppen 1980; Downey et al. 1980; Bachmair et al. 2003])*. Theories are 
built into the congruence closure algorithm by generating the necessary instances 
of the axioms (see [Nelson and Oppen 1980] for non-empty lists) or by adding pre- 
processing with respect to the axioms and suitable case analyses (see [Stump et al. 
2001] for arrays with extensionality) . Theories are combined by using the method 
of [Nelson and Oppen 1979]. Two properties relevant to this method are convexity 
and stable infiniteness: 

Definition 1. A theory ThT is convex, if for any conjunction H of T -atoms 
and for T-atoms Pi, l<i<n, T\=:HZ) V"=i ^i implies that there exists a j , 
1 < J < ^^7 such that T \=: H Z) Pj ■ 

In other words, if Vr=i ^i ^^ ^^^'^ ™ ^^^ models oiTU H, there exists a Pj that is 
true in all models of T U iJ. This excludes the situation where all models oiTU H 
satisfy some Pj, but no Pj is satisfied by all. Since Horn theories arc those theories 
whose models are closed under intersection - a fact due to Alfred Horn [Horn 1951, 
Lemma 7] ~ it follows that Horn theories, hence equational theories, arc convex. 
The method of Nelson-Oppen without case analysis (also known as "branching" or 
"splitting" ) is complete for combinations where all theories are convex (e.g., [Nelson 
and Oppen 1979; Barrett et al. 2002b; Ganzinger 2002]). The method of Nelson- 
Oppen with case analysis is complete for combinations where all involved theories 
are stably infinite [Tinelli and Harandi 1996]: 

Definition 2. A theory ThT is stably infinite, if for any quantifier- free T- 
formula ip, (p has a T -model if and only if it has an infinite T -model. 

When combining the quantifier- free theory of equality with only one other theory, 
the requirement of stable infiniteness can be dropped [Ganzinger 2002] . For first- 
order logic, compactness implies that if a set of formulae has models with domains 
of arbitrarily large finite cardinality, then it has models with infinite domains (e.g., 
[van Dalen 1989] for a proof). Thus, for T a first-order presentation, and p a 
quantifier-free T-formula, if ip has arbitrarily large finite T-models, it has infinite 
T-models; or, equivalcntly, if it has no infinite T-model, there is a finite bound 
on the size of its T-models.^ Using this property, one proves (cf. Theorem 4 in 
[Barrett et al. 2002b]): 



^Unknown to most, the conference version of [Downey et al. 1980] appeared in [Downey et al. 
1978] with a different set of authors. 

^A proof of this consequence of compactness in the context of decision procedures appears in 
[Ganzinger et al. 2004], where (f> is assumed to have been reduced to disjunctive normal form, so 
that the proof is done for a set of literals. 
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Theorem 2.1. (Barrett, Dill and Stump 2002) Every convex first- order theory 
with no trivial models is stably-infinite. 



Thus, stable infiniteness is a weaker property characterizing the theories that can 
be combined according to the Nelson-Oppen scheme. 

If a decision procedure with a buih-in theory is a little engine of proof, an inference 
system for full first-order logic with equality can be considered a "big" engine of 
proof (e.g., [Stickel 2002]). One such engine is the rewrite-based inference system 
SV, whose expansion and contraction inference rules are listed in Figures 1 and 2, 
respectively. Expansion rules add what is below the inference line to the clause set 
that contains what is above the inference line. Contraction rules remove what is 
above the double inference line and add what is below the double inference line. 
Combinations of these inference rules or variants thereof form the core of most 
theorem provers for first-order logic with equality, such as Otter [McCune 2003], 
SPASS [Weidenbach et al. 1999], Vampire [Riazanov and Voronkov 2002], and E 
[Schulz 2002], to name a few. Formulations with different terminologies (e.g., left 
and right superposition in place of paramodulation and superposition) appear in 
the vast literature on the subject (e.g., [Plaisted 1993; Bonacina 1999; Nieuwenhuis 
and Rubio 2001; Dershowitz and Plaisted 2001] for surveys where more references 
can be found). 

A fundamental assumption of rewrite-based inference systems is that the universe 
of terms, hence those of literals and clauses, is ordered by a well-founded ordering. 
SV features a complete simplification ordering (CSO) >- on terms, extended to 
literals and clauses by multiset extension as usual. A simplification ordering is 
stable {I )^ r implies la >- ra for all substitutions cr), monotonic {I )~ r implies 
t\l] >- t[r\ for all t, where the notation t[l] represents a term where I appears as 
subterm in context i), and has the subterm property (i.e., it contains the subterm 
ordering l>: ll> r implies I )^ r). An ordering with these properties is well-founded. 
A CSO is also total on ground terms. The most commonly used CSO's are instances 
of the recursive path ordering (RPO) and the Knuth-Bendix ordering (KBO). An 
RPO is based on a precedence (i.e., a partial ordering on the signature) and the 
attribution of a status to each symbol in the signature (either lexicographic or 
multiset status). If all symbols have lexicographic status, the ordering is called 
lexicographic (recursive) path ordering (LPO). A KBO is based on a precedence 
and the attribution of a weight to each symbol. All instances of RPO and KBO 
are simplification orderings. All instances of KBO and LPO based on a total 
precedence are CSO's. Definitions, results and references on orderings for rewrite- 
based inference can be found in [Dershowitz and Plaisted 2001]. 

A well-founded ordering >- provides the basis for a notion of redundancy: a ground 
clause C is redundant in S if for ground instances {-Di, . . . Dk\ of clauses in S it is 
{Di, . . . Dk] \= C and {Di, . . . Dk) -< {C}, a clause is redundant if all its ground 
instances are. An inference is redundant if it uses or generates a redundant clause 
and a set of clauses is saturated if all expansion inferences in the set are redundant. 
In SV, clauses deleted by contraction are redundant and expansion inferences that 
do not respect the ordering constraints are redundant. 
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Paramodulation 




CVl[u']jkr D\fu~t 
{Cy Dyl[t]gkr)a ^*^' 


(ji), (Hi), (iv) 


Rejection 




— \fL e C : (u' 


~ m)(t 7< La 


Equational Factoring 


CVuc 

{cvt^ 




-t'}UC:{u~ t)a / La 



where a is the most general unifier (mgu) of u and u' , u is not a variable in Superposition 
and Paramodulation, and the following abbreviations hold: 

(i). is ua -^ ta, 

(ii). is\/L£ D : {u~ t)a 2< La, 

(iii). is Z[u']cr 2< ra, and 

(iv), is VL e C : (/K] txi t)(j z< Lcr. 

Fig. 1. Expansion inference rules of SV. 

C D 
Strict Subsumption ^^^^= D > C 

C 



Simplification 


Clu] I ~ r 


u = la, 


la >- ra. 


C[u] y (1 . 


,r)a 


C[ra], l~r 


Dclotioji 


cvt~t 

















where D y> C ii D > C and C '^ D; and D > C if Ccr C D (as multisets) for some 
substitution a. In practice, theorem provers such as E apply also subsumption of variants: 
if D > C and C > D, the oldest clause is retained. 



Fig. 2. Contraction inference rules of SV. 
Let SVy be SV with CSO >. An SV)^- derivation is a sequence of sets of clauses 

Do 1^ S\ \ ■ ■ ■ Si \ ... 

SV'^ SVy SVy. 

where at each step an iS'P^ -inference is applied. A derivation is characterized by 
its limit, defined as the set of persistent clauses 

^co = u n ^- 

A derivation is fair if all expansion inferences become redundant eventually, and a 
fair derivation generates a saturated limit. 

Since inference systems are non-deterministic, a theorem-proving strategy is ob- 
tained by adding a search plan, that drives rule application. A search plan is fair if 
it only generates fair derivations. An SV'^- strategy is a theorem-proving strategy 
with inference system SV^^ . If the inference system is refutationally complete and 
the search plan is fair, the theorem-proving strategy is complete: Soo is saturated 
and the empty clause □ is in 6*00 if and only if ^o is unsatisfiable. A proof of the 
refutational completeness of SV can be found in [Nieuwenhuis and Rubio 2001] and 
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definitions and references for redundancy, saturation and fairness in (e.g., [Bonacina 
and Hsiang 1995; Nieuwenhuis and Rubio 2001; Bonacina and Dershowitz 2007]). 
For additional notations and conventions used in the paper, Var{t) denotes the 
set of variables occurring in t; the depth of a term t is written depth{t), and 
depth{t) = 0, if < is either a constant or a variable, depth{t) = l+max{depth{ti) : 1 < 
i < n}, if i is a compound term /(ii, . . . , i„). A term is flat if its depth is or 1. For 
a literal, depth{l m r) = depth{l) + depth {r) . A positive literal is flat if its depth is 
or 1. A negative literal is flat if its depth is 0. Let F = {D,J) be the interpretation 
with domain D and interpretation function J. Since our usage of interpretations is 
fairly limited, we use F without specifying D oi J whenever possible. Lower case 
letters surmounted by a hat, such as d and e, denote elements of the domain D. As 
usual, [i]r denotes the interpretation of term t in F. Generalizing this notation, if 
c is a constant symbol and / a function symbol, we use [c]r in place of J{c) for the 
interpretation of c in F and [/]r in place of J{f) for the interpretation of / in F. 
Small capital letters, such as S, denote sorts. If there are many sorts, D is replaced 
by a tuple of sets, one per sort, and [s]r denotes the one corresponding to S in F. 

3. REWRITE-BASED SATISFIABILITY PROCEDURES 

The rewriting approach to T-satisfiability aims at applying an inference system 
such as SP to clause sets 5*0 = T U 5, where T is a presentation of a theory and 
S a set of ground T-literals. This is achieved through the following phases, that, 
together, define a rewrite-based methodology for satisfiability procedures: 

(1) T -reduction: specific inferences, depending on T, are applied to the problem 
to remove certain literals or symbols and obtain an equisatisfiable T -reduced 
problem. 

(2) Flattening: all ground literals are transformed into flat literals, or flattened, 
by introducing new constants and new equations, yielding an equisatisfiable T- 
reduced /Zai problem. For example, a literal store{ai,ii,vi) ~ store(a2,i2, ^2) 
is replaced by the literals store{ai,ii,vi) ~ ci, store{a2, 12,^2) — C2 and ci ~ 
C2. Depending on T, flattening may precede or follow T-reduction. 

(3) Ordering selection and termination: SVy is shown to generate finitely many 
clauses when applied fairly to a T-reduced fiat problem. Such a result may 
depend on simple properties of the ordering >-: an ordering that satisfies them 
is termed T-good, and an tST'^ -strategy is T-good if >~ is. It follows that a fair 
T-good iS'P^ -strategy is guaranteed to terminate on a T-reduced flat problem. 
The T-goodness requirement may be vacuous, meaning that any CSO is T- 
good. 

This methodology can be fully automated, except for the proof of termination and 
the definition of T-goodness: indeed, T-reduction is made of mechanical inferences, 
fiattening is a mechanical operation, and contemporary theorem provers feature 
mechanisms to generate automatically orderings for given signatures and with given 
properties. 

Let £ denote the empty presentation, that is, the presentation of the quantifier- 
free theory of equality. If T is £, S* is a set of ground equational literals built 
from free function and constant symbols, and SVy reduces to ground completion, 
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which is guaranteed to terminate, with no need of flattening, T-reduction or T- 
goodness. Therefore, any fair iS'P^ -strategy is a satisfiabihty procedure for the 
quantifier-free theory of equahty. In the rest of this section we apply the rewrite- 
based methodology to several theories. For each theory, the signature contains 
the function symbols indicated and a finite set of constant symbols. A supply of 
countably many new constant symbols is assumed to be available for flattening. 

3.1 The theory of records 

Records aggregate attribute-value pairs. Let Id = {idi, . . . ,idn] be a set of at- 
tribute identifiers and Ti, . . . , T„ be n sorts. Then, REC(idi : Ti, . . . , idn : T„), 
abbreviated REC, is the sort of records that associate a value of sort T^ to the at- 
tribute identifier idi, for 1 < i < n. The signature of the theory of records has a 
pair of function symbols rselect^ : REC -^ T^ and rstore^ : REC x T^ ^ REC for each 
i, 1 < i < n. The presentation, named TZ{idi : Ti, . . . ,i(i„ : T„), or TZ for short, is 
given by the following axioms, where a; is a variable of sort REC and w is a variable 
of sort Ti : 

yx,v. rselecti(rstorei(a;, w)) ~ v for alii, I < i < n (1) 

yx,v. rselectj(rstorei(a;, u)) ~ rselectj(a;) for all i,i,l<i^i<n (2) 

For the theory of records with extensionality, the presentation, named 7^"^, in- 
cludes also the following axiom, that states that two records are equal if all their 
fields are: 

n 

yx, y. (A rselecti(a;) ~ rselccti(y) Z) x c^ y) (3) 

where x and y are variables of sort REC. TZ and TZ"^ are Horn theories, and there- 
fore they are convex. We begin with 7?,-reduction, that allows us to reduce TZ'^- 
satisfiability to 7?.-satisfiability: 

Definition 3. A set of ground TZ-literals is 7?,-reduced if it contains no literal 
Ic/^r, where I and r are terms of sort REC. 

Given a set of ground 7?,-literals S and a literal L — I'^r ^ S, such that I and r 
are terms of sort REC, TZ-reduction first replaces L by the clause 

n 

Cl — Y rselecti(^) 9^rselecti(r) 

i=l 

that is the resolvent of L and the clausal form of (3). Thus, if 5* = S*! W 5*2, where 
S2 contains the literals 1 9^ r with I and r of sort REC and Si all the other literals, 
S is replaced by Si U {Cl ■ L G S'2}. Then, this set of clauses is reduced into 
disjunctive normal form, yielding a disjunction of 7^-reduced sets of literals. Let 
Red-jiiS) denote the class of 7?.-reduced sets thus obtained. 

Lemma 1. Given a set of ground TZ-literals S , TZ'^ U S is satisfiable if and only 
ifTZUQ is, for some Q e Red-ji{S). 

Proof: 

{<=) Let r be a many-sorted model of 7^ U Q. The claim is that there exists an 
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interpretation F' that satisfies TZ'^ U S. The only non-trivial part is to show that F' 
satisfies the extensionality axiom of TZ'^, because in order to satisfy extensionality 
F' needs to interpret the equality predicate ~ also on records, whereas F does not. 
To simplify notation, let ~r stand for [~]r and '^r' stand for [~]r'- Then let F' be 
the interpretation that is identical to F, except that ^r' is defined as follows: 

— for all a, 6 G [REC]r, a ^r' b if and only if [rsclecti]r(a) ^r [rsclecti]r(6) for all i, 
1 < i < n, and 

— for all a,b E [T^Jr, a "^r' b if and only if a ^r b, for alH, 1 < z < n. 

The relation ~r' is clearly an equivalence. To prove that it is a congruence, we only 
need to show that if d ~r' b, then [rstorei]r(d, e) ~ [rstorei]r(&, e) for alii, 1 < i < n 
and e G [T^Jr- By way of contradiction, assume that a ^r' b, but [rstorei]r(d, e) 9^r' 
[rstorei]r(^, e) for some i, I < i < n, and e G [t^Jt- In other words, by definition 
of ^r', it is [rselectfe]r([rstorei]r(d, e)) ^r [rselectfc]r([rstorei]r(6, e)) for some k, 
I < k < n. There are two cases: either k ^ i or k ^ i. If /c = i, then, since F, whence 
F', is a model of axiom (1), it follows that e 9^r e, a contradiction, li k ^ i, since F, 
whence F', is a model of axiom (2), it follows that [rselectfe]r(a) 7^r [rselectfc]r(6), 
which contradicts the assumption a ^r' b. Thus, ~r' is well defined, and F' is a 
model of 7^^ U S. 
(^) This case is simple and is omitted for brevity. D 

Termination depends on a case analysis showing that only certain clauses can be 
generated, and resting on a simple assumption on the CSO: 

Definition 4. A CSO >- is 7?,-good if t >- c for all ground compound terms t 
and constants c. 

Most orderings can meet this requirement easily: for instance, for RPO's, it is 
sufficient to assume a precedence where all constant symbols are smaller than all 
function symbols. 

Lemma 2. All clauses in the limit Soc of the derivation Sq h^-p Si ... Si h^-p . . . 
generated by a fair TZ-good SVy -strategy from Sq = TZU S , where S is an TZ-reduced 
set of ground fiat TZ-literals, belong to one of the following classes, where r, r' are 
constants of sort REC, and e, e' are constants of sort T^ for some i, 1 < i < n: 

i) the empty clause; 

a) the clauses in TZ: 

a. a) rselecti(rstorei(a:;, v)) c^ v, for all i, I < i < n 

ii.b) rselectj(rstorei(a;, u)) ~ rselect^ (x) , for all i,j, I <i ^ j < n; 

Hi) ground flat unit clauses of the form: 
Hi. a) r c^i r' , 
iii.b) e ~ e', 
iii.c) e^e' , 

iii.d) rstorei(r, e) ~ r', for some i, 1 < i < n, 
iii.e) rselecti(r) ~ e, for some i, 1 < i < n; 

iv) rselecti(r) ~ rselecti(r'), for some i, 1 < i < n. 
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Proof: we recall that inequalities r^^r' are not listed in (in), because S is TZ- 
reduced. All clauses in the classes above are unit clauses, and therefore have a 
unique maximal literal. Since >- is an 7?,-good CSO, the left side of each literal 
is maximal (for (in. a), (iii.b), (iii.c) and (iv) this can be assumed without loss of 
generality). The proof is by induction on the index i of the sequence {Si}i. For 
i = 0, all clauses in 5*0 are in (ii) or (Hi). For the inductive case, we assume the 
claim is true for i and we prove it for i + 1 . Equational factoring applies to a clause 
with at least two positive literals, and therefore does not apply to unit clauses. 
Reflection may apply only to a clause in (iii.c) to yield the empty clause. For 
binary inferences, we consider each class in turn: 

— Inferences within (ii): None applies. 

— Inferences within (iii): The only possible inferences produce ground flat unit 
clauses in (iii) or the empty clause. 

— Inferences between a clause in (iii) and a clause in (ii): A superposition of a 
clause in (iii.d) into one in (ii.a) yields a clause of the form rselecti(r') ~ e 
which is in (iii. e). A superposition of a clause in (iii. d) into one in (ii. b) yields a 
clause of the form rselectj(r') ~ rselectj(r) which is in (iv). No other inferences 
apply 

— Inferences between a clause in (iv) and a clause in (ii)-(iv): the only applicable 
inferences are simplifications between clauses in (iii. a), (iii.e) and (iv) which 
yield clauses in (iii.e) or (iv). □ 

Since only finitely many clauses of the kinds enumerated in Lemma 2 can be 
built from a finite signature, the saturated limit 6*00 is finite, and a fair derivation 
is bound to terminate: 

Lemma 3. A fair TZ-good SV-^- strategy is guaranteed to terminate when applied 
to TZU S, where S is an IZ-reduced set of ground flat IZ-literals. 

Theorem 3.1. A fair TZ-good SVy -strategy is a polynomial satisfiability proce- 
dure for TZ and an exponential satisfiability procedure for TZ'^ . 

Proof: it follows from Lemmas 1 and 3 that a fair 7?.-good iS7'>- -strategy is a 
satisfiability procedure for TZ and TZ'^ . For the complexity, let m be the number of 
subterms occurring in the input set of literals S. Let h be the number of literals 
of S in the form I 9^r, with I and r terms of sort REG. Red'ii{S) contains n^ sets, 
where n is a constant, the number of field identifiers in the presentation. Since h is 
0{m,), Red-jz{S) contains 0{n™') sets. Since fiattening is a 0{m) operation, after 
flattening there are 0{n"^) sets, each with 0{m) subterms. For each set, inspection 
of the types of clauses and inferences allowed by Lemma 3 shows that the number 
of generated clauses is 0{nri?). In other words, the size of the set of clauses during 
the derivation is bound by a constant k which is 0(m?). Since each inference step 
takes polynomial time in k, the procedure is polynomial for each set, and therefore 
for TZ, and exponential for TZ'^ . □ 

3.2 The theory of integer offsets 

The theory of integer offsets is a fragment of the theory of the integers, which is 
applied in verification (e.g., [Bryant et al. 2002]). Its signature does not assume 
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sorts, or assumes a single sort for the integers, and has two unary function symbols 
s and p, that represent the successor and predecessor functions, respectively. Its 
presentation, named X, is given by the following infinite set of sentences, as in, e.g., 
[Ganzinger et al. 2004]: 

Va;.s(p(a;)) ~ x (4) 

Va;.p(s(a;)) ~ x (5) 

Vx.s*(x) g^ X for i > (6) 

where S"'^(x) = s(x), s*+"'^(a;) — s(s*(x)) for i > 1, and the sentences in (6) are 
called acyclicity axioms. For convenience, let Ac = {Vx.s*(x) 9^x : i > 0} and 
Ac{n) = {Vx. s^{x) ^x: < i < n}. Like the theory of records, T is also Horn, and 
therefore convex. 

Remark 1. Axiom (5) implies that s is injective: 

Vx, y. s{x) ~ s(y) D a; 2± y (7) 

Indeed, consider the set {p(s(x)) ~ x,s{a) ~ s{b),a:^b}, where {s(a) ~ s{b),a^b} 
is the clausal form of the negation of (7). Superposition of s(a) 2± s(6) into p(s(x)) ~ 
X generates p(s(6)) ~ a. Superposition of p(s(&)) ~ a into p(s(x)) ~ x generates 
a '^b, that contradicts a^b. 

Definition 5. A set of ground flat I -literals is X-rcduccd if it does not contain 
occurrences of p. 

Given a set S of ground flat T-literals, the symbol p may appear only in literals of 
the form p(c) ~ b. Negative ground flat literals have the form c^b, and therefore 
do not contain p. T-reduction consists of replacing every equation p(c) ~ 6 in S* 
by c ~ s(6). The resulting J-reduced form of S is denoted Redx{S). Z-reduction 
reduces satisfiability with respect to X to satisfiability with respect to Ac, so that 
axioms (4) and (5) can be removed, provided lemma (7) is added: 

Lemma 4. Let S be a set of ground flat 2 -literals. TUS is satisfiable if and only 
ifAcU{{7)}URedx{S) is. 

Proof: 

(=>) It follows from Remark 1 and the observation that T U S \= Redx{S), since 
c ~ s{b) is a logical consequence of T and p(c) 2± b, as it can be generated by a 
superposition of p(c) ~ b into axiom s(p(x)) ~ x. 

(<^) Let r be a model of AcU {{7)}U Redx{S) and let D be its domain. We build a 
model r' of I U S. T' interprets all constants in S in the same way as F does. The 
crucial point is that p does not occur in Ac U {(7)} U Redx{S), so that F does not 
interpret it, whereas F' should. Since not all elements oi D may have predecessor in 
D itself, the domain D' of F' will be a superset of D, containing as many additional 
elements as are needed to interpret p. We construct recursively families of sets 
{Di}i and functions {si}i and {pi}i in such a way that, at the limit, all elements 
have predecessor: 

— Base case: i = 0. Let Dq = D, sq ~ [s]r and po — 0. By establishing sq = [s]r, 
the interpretation of s on D is preserved: for all d <E D, [s]r{d) = [s]r'{d). We 
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start by partitioning Dq into the subset Eq of elements that are successors of 
some other element, and the subset Fq of those that are not: Eq ~ {e: so{d) = 
e for some d ^ Dq} and Fq ^ Dq \ Eq. For all e G _Eo, we define pi{e) — d 
such that So{d) = e: such a d exists by definition of Eq and it is unique because 
r 1= (7). Thus, pi is well-defined on Eq. Next, we define pi on Fq. Let Dq be 
a set disjoint from Dq and let rjQ: Fq ^ Dq be a bijection: intuitively, rjQ maps 
each element of Fq to its predecessor that is missing in Dq. Indeed, for all d € Fq, 
we define pi{d) = r]Q{d). Then we define si: for all e £ Dq, si(e) = so(e) and for 
all e G Dq, ■^i{(^) = d, where d is the element such that rio{d) — e. Establishing 
that -Di = Z^o W Dq closes the base case. 

— Recursive case: suppose that for i > 1, we have a Di_i C Di, where we have 
defined Si and pi in such way that for all d G -Di-i, there exists an e G Di such 
that Si{d) — e and Pi{e) ~ d. On the other hand, there may be elements in 
Di that are not successors of any other element, so that their predecessor is not 
defined. Thus, let Ei = {e: Si{d) = e for some d G Di} and Fi = Di\Ei. Let D^ 
be a set of new elements and rji : Fi -^ D'i a, bijection. Then, let I?i+i = Di^D'i. 
For Pi+i, for all e G Ei, pi_|_i(e) = d such that Si{d) — e and for all d G Fi, 
Pi+i{d) = rii{d). For Si+i, for all e G Di, Si+i(e) — Si(e) and for all e G D'i, 
Si+i{e) ~ d, where d is the element such that rii{d) =^ e. 

Then we define D' = Ui A, [s]r' = Ui «» and [p]r' = UiK- We show that F' |= 
T U S. Axioms (4) and (5) are satisfied, since, by construction, for all e G D' , 
[p]r'{e) = d and [s]r'(d) = e. The only equations in S* \ Redx{S) are the p(c) ~ b 
for which s(6) c± c G Redi{S). Let [6]r' = d and [c]r' — e: since F \= s(6) ~ c, it 
follows that e e Eq, and F' ^ p(c) ~ 6. D 

Example 1. Let S — {s(c) ~ c'} and let F he the model with domain N, such 
that c is interpreted as and c' as 1. Then Dq = IN, Eq = K \ {0} and Fq — {0}. 
At the first step of the construction, we can take Dq = { — 1}, and have pi(0) = — 1 
and si(— 1) = 0. Then Fi ~ { — 1}; and we can take D'l = {—2}, and so on. At the 
limit, D' is the set Z of the integers. 

The next step is to bound the number of axioms in Ac needed to solve the 
problem. The intuition is that the bound will be given by the number of elements 
whose successor is determined by a constraint s(c) ~ c' in S. Such a constraint 
establishes that, in any model F of S, the successor of [c]r must be [c']r. We call 
s-free an element that is not thus constrained: 

Definition 6. Let S be a satisfiable X-reduced set of ground flat T -literals, F be 
a model of S with domain D and C's be the set of constants Cs — {c : s(c) ~ c' G 5}. 
An element d £ D is s-frce in S for T , if for no c £ Cs, it is the case that [c]r = d. 

We shall see that it is sufficient to consider Ac{n), where n is the cardinality of 
Cs- 

Example 2. If S = {s(ci) ~ C2, s(ci) ~ cs, C2 ~ C3}, then Cs = {ci} and \Cs\ = 
1. In the worst case, however, all occurrences of s apply to different constants, so 
that \Cs\ is the number of occurrences of s in S. 
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We begin with a notion of s-path that mirrors the paths in the graph (D, [s]r) 
defined by an interpretation F: 

Definition 7. Let S be a satisfiable X-reduced set of ground flat X-literals and 
let T be a model of S with domain D. For all m > 2, a tuple {di, s, ^2, s, . . . dm, s) 
is an s-path of length m if 

(i) Vz, j, 1 <i i= i <m, di ^ dj and 

(2) \/i, l<i <m, dVi == [s]r(di). 

It is an s-cycle if additionally, [s]r((im) — di. 

ft is clear to see that F |= Ac{n) if and only if F has no s-cycles of length smaller 
or equal to n. 

Lemma 5. Let S be an X-reduced set of ground flat X-literals with \Cs\ — n. If 
there is an s-path p of length m > n in a model F of S , then p includes an element 
that is s-free in S . 

Proof: by way of contradiction, assume that no element inp — (di , s, ^2; s, . . . dm, s) 
is s-free in S. By Definition 6, this means that for all j, I < j < m, there is a 
constant Cj G C's such that [cj]t ~ dj. Since by Definition 7 all elements in an 
s-path are distinct, it follows that C's should contain at least m elements, or n > m, 
which contradicts the hypothesis that m > n. □ 

Lemma 6. Let S be an X-reduced set of ground flat X-literals with \C's\ — I- For 
n>l, if Ac{n) U {(7)} U S is satisfiable then Ac{n + f ) U {(7)} U 5" is. 

Proof: let F = {D, J) be a model of Ac(n) U {(7)} U 5*. F has no s-cycles of length 
smaller or equal to n. We build a model F' with no s-cycles of length smaller 
or equal to n -I- 1. Let P — {p: p is ans— cycle of length n + I}. If P = 0, 
F (= Ac{n + 1) and F' is F itself. If P 7^ 0, there is some p e P. Since n + 1 > I, 
by Lemma 5, there is some d in p that is s-free in S for F. Let Ep = {cj : j > 0} 
be a set disjoint from D, and let Jp be the interpretation function that is identical 
to J, except that Jp{s){d) = cq and Jp{s){ej) ~ Sj+i for all j > 0. By extending 
D into D U Ep and extending J into Jp, we obtain a model where the s-cycle p 
has been broken. By repeating this transformation for all p G P, we obtain the F' 
sought for. Indeed, F' \= Ac{n + I), because it has no s-cycles of length smaller 
or equal to n -|- 1. F' [= 5*, because it interprets constants in the same way as F, 
and for each s(c) ~ e G 5, [c]r is not s-free in S, which means [s(c)]r' = [s(c)]r. 
To see that F' |= (7), let d and d' be two elements such that [s]r'(rf) = [s]r'(rf')- If 
[s]r'{d) e D, then [s]r'id) = [s]r{d), so that d = d\ because F ^ (7). If [s]r'{d) ^ D, 
then [s]r'(^) = e for some e introduced by the above construction, so that d is the 
unique element whose successor is e, and d — d' . □ 

By compactness, we have the following: 

Corollary 1. Let S be an X-reduced set of ground flat X-literals with \Cs\ — n. 
Ac U {(7)} U S is satisfiable if and only if Ac{n) U {(7)} U S is. 

Proof: the "only if" direction is trivial and for the "if" direction induction using 
Lemma 6 shows that for all A: > 0, if Ac{n) U {(7)} U S' is satisfiable, then so is 

Ac{n + k)U{{7)}US. a 
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Definition 8. A CSO >~ is X-good if t y c for all constants c and all terms t 
whose root symbol is s. 

For instance, a precedence where all constant symbols are smaller than s will yield 
an I-good RPO. 

Lemma 7. All clauses in the limit Soo of the derivation Sq \-sv Si ... Si \-sv ■ ■ ■ 
generated by a fair J -good SPy -strategy from Sq = Ac{n)U{(J)}US, where S is an 
I-reduced set of ground flat I-literals with \Cs\ — n, belong to one of the following 
classes, where bi, . . .bk, di, . . . dk, c, d and e are constants (k > 0): 

i) the empty clause; 

a) the clauses in Ac(n) U {(7)}; 

a. a) s*(x) 9^x, for all i, < i < n, 

ii.b) s{x) 9^ s(j/) V a; ~ y; 

Hi) ground flat unit clauses of the form: 

Hi. a) c "^ d, 

Hi. b) c'~kd, 

iii.c) s(c) ~ d; 
iv) other clauses of the following form: 

iv.a) s{x) 9^ d V a; ~ c V Vi=i ^i ^ ^i' 

iv.b) c~ e\l\j^^^dii±bi, 

iv.d) s(c) ~ e V Vi=i di^bi, 

iv.e) s^ (c) 9^ e V VLi di^bi, 1 < j < n - 1. 

Proof: since ;^ is a CSO, the first literal is the only maximal literal in (ii.b). Since 
it is Z-good, the first literal is the only maximal literal in (iv.a), (iv.d) and (iv.e). 
For the same reason, the left hand side is maximal in the maximal literals in (iii.c), 
(iv.a), (iv.d) and (iv.e). The proof is by induction on the sequence {Si}i. For the 
base case, all clauses in 5*0 are in (ii) or (Hi). For the inductive case, we consider 
all possible inferences, excluding upfront equational factoring, which applies to a 
clause with at least two positive literals, and therefore does not apply to Horn 
clauses. 

— Inferences within (ii): Reflection applies to (ii.b) to generate x c:^ x, that gets 
deleted by deletion. 

— Inferences within (iii): The only possible inferences produce ground flat unit 
clauses in (iii) or the empty clause. 

— Inferences between a clause in (iii) and a clause in (ii): A paramodulation of 
an equality of kind (iii.c) into an inequality of type (ii.a) yields inequalities 
s*^^((i) 9^ c, 1 < z < n, that are in (iv.e) with k — (for i > 1) or (iii.b) (for 
z = 1). A paramodulation of a (iii.c) equality into (ii.b) yields s(a:;) '^dW x ~ c 
which is in (iv.a) with k = 0. 

— Inferences between a clause in (iv) and a clause in (ii): A paramodulation of a 
clause in (iv.d) into (ii.a) produces a clause in (iv.e) or (iv.e), and a paramodu- 
lation of a clause in (iv.d) into (ii.b) produces a clause in (iv.a). 
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— Inferences between a clause in (iv) and a clause in (iii): A simplification of a 
clause in (iv) by an equality in (iii. a) or (iii.c) generates another clause in (iv). 
Paramodulating an equality of kind (iii.c) into a (iv.a) clause yields a (iv.b) 
clause. Similarly, superposing a (iii.c) unit with a (iv.d) clause gives a (iv.b) 
clause. The only possible remaining inferences between a clause in (iv) and one 
in (iii) are paramodulations or superpositions of a (iv.b) clause into clauses in 
(iii), that add clauses in (iv.b), (iv.c) and (iv.d). 

— Inferences within (iv): Reflection applies to clauses in (iv.b) and (iv.c) to yield 
clauses in (iv.b) or (iii. a) and (iv.c) or (iii.b), respectively. A paramodulation or 
superposition of a (iv.b) clause into a (iv.b), (iv.c), (iv.d) or (iv.e) clause generates 
clauses also in (iv.b), (iv.c), (iv.d) or (iv.e), respectively. A superposition of a 
clause of kind (iv.d) into a (iv.a) clause gives a clause in (iv.b). A superposition 
between two (iv.d) clauses adds a (iv.b) clause. A paramodulation of a clause of 
type (iv.d) into a (iv.e) clause yields a clause in (iv.e) or (iv.c). □ 

Given a finite signature, only finitely many clauses of the types allowed by 
Lemma 7 can be formed. Thus, we have: 

Lemma 8. A fair X-good SVy -strategy is guaranteed to terminate when applied 
to Ac{n) U {(7)} U S*, where S is an X -reduced set of ground flat X -literals with 

\Cs\=n. 

Theorem 3.2. A fair X-good SVy -strategy is an exponential satisfiability pro- 
cedure for X. 

Proof: the main result follows from Lemma 4, Corollary 1 and Lemma 8. For the 
complexity, let m be the number of subterms occurring in the input set of literals S. 
Redx{S) has the same number of subterms as S, since Z-rcduction replaces literals 
of the form p(c) ~ 6 by literals of the form c ~ s(6). Flattening is 0(m). The 
number n of retained acyclicity axioms, according to Lemma 8, is also 0{m), since 
in the worst case it is given by the number of occurrences of s in S. By the proof 
of Lemma 7, at most h = 0{m^) distinct literals and at most 0(2'*) clauses can be 
generated. Thus, the size of the database of clauses during the derivation is bound 
by a constant k which is 0(2'*). Since each inference step takes polynomial time in 
k, the overall procedure is 0(2™ ). D 

Corollary 2. A fair SVy -strategy is a polynomial satisfiability procedure for 
the theory presented by the set of acyclicity axioms Ac. 

Proof: the proof of Lemma 7 shows that if the input includes only Ac{n) U S, the 
only generated clauses are finitely many ground flat unit clauses from S (inferences 
within (Hi)), and finitely many equalities in the form s*^^(d) 9^ c, for 1 < i < n, 
by paramodulation of equalities s(c) ~ d S S into axioms s^{x)^x (inferences 
between a clause in (iii) and a clause in (ii)). It follows that the number of clauses 
generated during the derivation is 0{m^), where m is the number of subterms 
occurring in the input set of literals. The size of the database of clauses during the 
derivation is bound by a constant k which is O(to^), and since each inference step 
takes polynomial time in k, a polynomial procedure results. □ 
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3.3 The theory of integer offsets modulo 

The above treatment extends to the theory of integer offsets modulo, which makes 
possible to describe data structures with indices ranging over the integers modulo 
k, such as circular queues. A presentation for this theory, named Ik, is obtained 
from T by replacing Ac with the following k axioms 

yx.s\x)^x iorl<i<k-l (8) 

Va;.s'=(x)~x (9) 

where fc > 1. 2^ also is Horn and therefore convex. 

Definition 5 and Lemma 4 apply also to Tk, whereas Lemma 6 is no longer 
necessary, because Xk is finite to begin with. Termination is guaranteed by the 
following lemma, where C{k) ~ {Vx. s'^(x) ~ x}: 

Lemma 9. A fair 2-good SVy -strategy is guaranteed to terminate when applied 
to Ac{k — 1)UC(A:)U{(7)}U5, where S is an X -reduced set of ground flat Xk-literals. 

Proof: the proof of termination rests on the proof of Lemma 7, with n = k — 1 
and the following additional cases to account for the presence of C{k). As far as 
inferences between axioms are concerned (i.e., within group (ii) in the proof of 
Lemma 7), C{k) does not introduce any, because s^{x) ~ x cannot paramodulate 
into s*(a:)9^x, since i < k, and cannot paramodulate into (7), since fc > 1. For 
inferences between axioms and literals in S (i.e., groups (ii) and (Hi) in the proof 
of Lemma 7), the presence of C(k) introduces superpositions of literals s(c) ~ d £ S* 
into s'^(a;) ~ x, generating s^^^{d) ~ c for 1 < i < fc. If we use j in place of i — 1 
and n in place of A: — 1, we have s^{d) ~ c for < j < n. Excluding the cases j = 
and j = 1 that are already covered by classes (Hi. a) and (iii.c) of Lemma 7, we 
have an additional class of clauses, with respect to those of Lemma 7: 

v) s^d) ~ c for 2 < j < n. 

Thus, we only need to check the inferences induced by clauses of type (v). There 
are only two possibilities. Paramodulations of equalities in (v) into inequalities in 
(ii.a) gives more clauses in (iv.e) with k — 0. Paramodulations of equalities in (v) 
into clauses in (iv.e) gives more clauses in (iv.c) or (iv.e). Since only finitely many 
clauses of types (i-v) can be formed from a finite signature, termination follows. □ 

Theorem 3.3. A fair X-good SV^^- -strategy is an exponential satisfiability pro- 
cedure for Xk . 

Proof: it follows the same pattern of the proof of Theorem 3.2, with Lemma 9 in 
place of Lemma 8. □ 

Alternatively, since Xfc is finite, it is possible to omit X-reduction and show ter- 
mination on the original problem format. The advantage is that it is not necessary 
to include the injectivity property (7), so that the resulting procedure is polyno- 
mial. Furthermore, abandoning the framework of X-reduction, that was conceived 
to handle the infinite presentation of X, one can add axioms for p that are dual of 
(8) and (9), resulting in the presentation Z^ made of (4), (5), (8), (9) and: 

Vx.p*(x)9^x forl<i<A;-l (10) 

Vx.p'^(x)~x (11) 
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with fc > 1. 2^ is also Horn and therefore convex. 

Definition 9. A CSO >- is 2^-good if t )^ c for all ground compound terms t 
and constants c. 

Lemma 10. A fair T'/^-good SVy -strategy is guaranteed to terminate when applied 
to X'j^ U S , where S is a set of ground flat X'j^-literals. 

Proof: termination foUows from the general observation that the only persistent 
clauses, that can be generated by SVy. from I'f. U S, are unit clauses I cxi r, such 
that I and r are terms in the form s-' (u) or p^ (u), where < j < /c — 1 and u is either 
a constant or a variable. Indeed, if a term in this form with j > k were generated, 
it would be simplified by axioms (9) s'^(a;) ~ x or (11) p'^(a;) ~ x. Similarly, if a 
term where s is applied over p or vice versa were generated, it would be simplified 
by axioms (4) s(p(a:;)) ~ a; or (5) p(s(a;)) 2± x. Given a finite number of constants 
and with variants removed by subsumption, the bound on term depth represented 
by k implies that there are only finitely many such clauses. D 

Theorem 3.4. A fair X'/^-good SVy -strategy is a polynomial satisfiability proce- 
dure for X'^. . 

Proof: termination was established in Lemma 10. To see that the procedure is 
polynomial, let m be the number of subterms in the input set of ground literals. 
After flattening, we have 0{m) subterms, and since Z^ has 0{k) subterms, the 
input to the SVy-stiategy has 0{m + k) subterms. By the proof of Lemma 10, 
only unit clauses are generated, so that their number is 0{{m + k)'^). Since the size 
of the database of clauses during the derivation is bound by a constant h which is 
0{{m + A:)^), and each inference takes polynomial time in ft,, the overall procedure 
is polynomial. □ 

3.4 The theory of possibly empty lists 

Different presentations were proposed for a theory of lists. A "convex theory of 
cons, car and cdr," was studied by [Shostak 1984], and therefore it is named Csh- 
Its signature contains cons, car and cdr, and its axioms are: 

Vx,y. car(cons(x, y)) ~ a; (12) 

Vx,y. cdr(cons(x, J/)) ~ y (13) 

Vy. cons(car(y), cdr(j/)) ~ 2/ (14) 

The presentation adopted by [Nelson and Oppcn 1980], hence called Cno, adds 
the predicate symbol atom to the signature, and the axioms 

Vx, y. -1 atom(cons(a:;, y)) (15) 

\/y. -1 atom(y) D cons(car(y), cdr(j/)) ~ y (16) 

to axioms (12) and (13). 

A third presentation also appeared in [Nelson and Oppen 1980], but was not used 
in their congruence-closure-based algorithm. Its signature features the constant 
symbol nil, together with cons, car and cdr, but not atom. This presentation, that 
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we call £, adds to (12) and (13) the following four axioms: 

Vx,2/. cons(a;, y) 9^ nil (17) 

Vy. y^nilD cons(car(t/), cdr(y)) ~ y (18) 

car(nil) ~ nil (19) 

cdr(nil) ~ nil (20) 

C is not convex, because y ~ nil Vcons(car(j/),cdr(j/)) 2± y is in Th C, but neither 
disjunct is. 

Unlike the presentation of records given earlier, and that of arrays, that will be 
given in the next section, these presentations of lists are unsorted, or lists and their 
elements belong to the same sort. This is desirable because it allows lists of lists. 
Also, neither Csh nor Cno nor C exclude cyclic lists (that is, a model of anyone 
of these presentations can satisfy car(x) ~ x). The rewriting approach was already 
applied to both Csh and Cno in [Armando et al. 2003]. The following analysis 
shows that it applies to C as well. 

Definition 10. A CSO >~ is £-good if (1) t )^ c for all ground compound terms 
t and constants c, (2) t >~ nil for all terms t whose root symbol is cons. 

It is sufficient to impose a precedence >, such that function symbols are greater 
than constant symbols, including cons > nil, to make an RPO, or a KBO with a 
simple weighting scheme (e.g., weight given by arity), £-good. No £- reduction is 
needed, and the key result is the following: 

Lemma 11. All clauses in the limit Soc of the derivation Sq h^p Si ... Si h^p . . . 
generated by a fair C-good SVy -strategy from Sq — C U S, where S is a set of 
ground flat C-literals, belong to one of the following classes, where Ci and di for all 
i, ^ l£ i l£ n, and 61,62,63 are constants (constants include nilj.- 

i) the empty clause; 
ii) the clauses in C: 

ii.a) car(cons(x,2/)) ~ x, 

ii.b) cdr(cons(a;, y)) ~ y, 

ii.c) cons(a;, y) 9^ nil, 

ii.d) cons(car(y), cdr(y)) ^ y W y :^ nil, 

ii.e) car(nil) ~ nil, 

ii.f) cdr(nil) ci nil; 

Hi) ground flat unit clauses of the form: 

Hi. a) ci ~ C2, 

iii.b) ci 9^C2, 

iii.c) car(ei) ~ 62, 

iii.d) cdr(ci) ~ C2, 

iii.e) cons(ei,C2) — C3; 
iv) non-unit clauses of the following form: 

iv.a) cons(6i, cdr(e2)) ~ 63 V V"=i Ci ix di, 

iv.b) cons(car(ei), 62) ~ 63 V \/r=i ^i ix di, 

iv.c) cons(car(ei),cdr(62)) ~ 63 V V"=i '^i "^ ^i, 
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iv.d) cons(ei, 62) ~ 63 V VILi Ci cxi di, 
iv.e) car(ei) ~ car(e2) V Vi=i ^i ^^ di, 
iv.f) cdr(ei) ~ cdr(e2) V ViLi Cj ^ d^, 
iv.g) car(ei) ~ 62 V \/"=i Q ixi dj, 
iv.h) cdr(ei) ~ 62 V V"=i ^i IX ^i, 
iv.i) VLi Cj IX c^i- 

Proof: since >- is an £-good CSO, each clause in the above classes has a unique 
maximal literal, which is the first one in the above listing (up to a permutation of 
indices for (iv.i)). Furthermore, the left side in each maximal literal is maximal (for 
(Hi. a), (iii.b), (iv.e), (iv.f), (iv.i) this can be assumed without loss of generality). 
The proof is by induction on the sequence {Si]i. For the base case, input clauses 
are in (ii) or (Hi). For the inductive case, we consider all classes in order: 

— Inferences within (ii): All inferences between axioms generate clauses that get 
deleted. Superposition of (ii.a) into (ii.d) generates cons(x, cdr(cons(x,i;))) ~ 
cons(x, y)Vcons(x, y) ~ nil, which is simplified by (ii.h) to cons(x, y) ~ cons(x, y)V 
cons(x,y) ~ nil, which is deleted. Superposition of (ii.d) into (ii.a) produces 
cai{y) ~ car(2/)Vj/ ~ nil which is deleted. Superposition of (ii.b) into (ii.d) yields 
cons(car(cons(x, y)), y) ~ cons(x,j/) V cons(x,2/) c^ nil, whose simplification by 
(ii.a) gives cons(x,y) ~ cons(x,y) Vcons(x,j/) ~ nil, which is deleted. Superpo- 
sition of (ii.d) into (ii.b) generates cdr(y) ~ cdr(j/) V y ~ nil which gets deleted. 
Paramodulation of (ii.d) into (ii.c) produces the tautology y ~ nil My qk nil, which 
is eliminated by a step of refiection followed by one of deletion. Superposition of 
(ii.e) into (ii.d) yields cons(nil, cdr(nil)) ~ nil V nil ~ nil which is deleted. Sim- 
ilarly, superposition of (ii.f) into (ii.d) yields cons(car(nil),nil) ~ nil V nil ~ nil 
which is also deleted, and no other inferences apply among axioms. 

— Inferences within (iii) : Inferences on the maximal terms in (Hi) can generate only 
more ground flat unit clauses like those in (iii) or the empty clause. 

— Inferences between a clause in (iii) and a clause in (ii): Inferences between an 
axiom and a ground flat unit clause generate either more ground flat unit clauses 
or non-unit clauses in the classes (iv.a) and (iv.b). Indeed, the only applica- 
ble inferences are: superposition of a unit of kind (iii.c) into (ii.d), which gives 
a clause in the form cons(c2, cdr(ci)) ~ ci V ci ~ nil of class (iv.a); super- 
position of a unit of kind (iii.d) into (ii.d), which gives a clause in the form 
cons(car(ci), C2) ~ ci V ci ~ nil of class (iv.b); superposition of a unit of kind 
(iii.e) into (ii.a), (ii.b), (ii.c), which generates unit clauses in (iii.c), (iii.d) and 
(iii.b), respectively. 

— Inferences between a clause in (iv) and a clause in (ii): We consider the clauses 
in (ii) in order. For (ii.a): superposing a clause of kind (iv.a) or (iv.d) into (ii.a) 
generates car(e3) ~ ei V Vi=i ^i cxi d^, that is in (iv.g); superposing a clause of 
kind (iv.b) or (iv.e) into (ii.a) generates car(ei) ~ car(e3) V Vr=i "^^ "^ di that 
is in (iv.e). For (ii.b): superposing a clause of kind (iv.a) or (iv.e) into (ii.b) 
generates cdr(e2) — cdr(e3) V V"=i '^j "^ di that is in (iv.f); superposing a clause 
of kind (iv.b) or (iv.d) into (ii.b) generates cdr(e3) ~ 62 V \Ji^iCi \x\ di, that 
is in (iv.h). Paramodulating clauses of classes (iv.a), (iv.b), (iv.e) and (iv.d) 
into (ii.c) gives clauses in class (iv.i). For (ii.d): a superposition of (ii.d) into 
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(iv.c) or (iv.c) into (ii.d) yields a clause in (iv.i); a superposition of (iv.e) or 
(iv.J) into (ii.d) produces cons(car(e2),cdr(ei)) ~ ei V ei ~ nilV \/"=i Q ixi di or 
cons(car(ei), cdr(e2)) ~ ei Vei ~ nilV VILi Q M di that are in (iv.c); a superpo- 
sition of (iv.g) into fzi.fij produces cons(e2,cdr(ei)) ~ eiVei ~ nilV ViLi Ci ix di 
that is in (iv.a); a superposition of (iv.h) into (^ii.dj produces cons(car(ei), 62) — 
ei V ei ~ nilvVi^iCi cx: dj that is in (iv.h). Clause (ii.e) can simplify clauses 
in (iv.h), (iv.c), (iv.c), (iv.g), to clauses in (iv.d), (iv.a), (iv.g), (iv.i), respec- 
tively. Clause (a. J) can simplify clauses in (iv.a), (iv.c), (iv.J), (iv.h), to clauses 
in (iv.d), (iv.h), (iv.h), (iv.i), respectively. No other inferences apply. 

— Inferences hetween a clause in (iv) and a clause in (iii): The only possible ex- 
pansion inference here is a paramodulation of a clause in (iv.i) into a clause in 
(iii.h), which generates another clause in (iv.i). All other possible steps are sim- 
plifications, where an equality of class (iii) reduces a clause in (iv) to another 
clause in (iv). 

— Inferences within (iv) : Reflection applied to a clause in (iv) generates either the 
empty clause or a clause in (iv). Equational factoring applies only to a clause 
in (iv.i), to yield another clause of the same kind. The only other applicable 
inferences are superpositions that generate more clauses in (iv). Specifically, 
clauses of kind (iv.a) superpose with clauses in (iv.a), (iv.J), (iv.h) and (iv.i) to 
generate clauses in (iv.i), (iv.a) and (iv.d). Clauses of kind (iv.h) superpose with 
clauses in (iv.h), (iv.e), (iv.g) and (iv.i) to generate clauses in (iv.i), (iv.h) and 
(iv.d). Clauses of kind ('w. cj superpose with clauses in (iv.c), (iv.e), (iv.J), (iv.g), 
(iv.h) and (iv.i) to generate clauses in (iv.i), (iv.c), (iv.a) and (iv.h). Clauses of 
kind (iv.d) superpose with clauses in (iv.d) and (iv.i) to generate clauses in (iv.i) 
and (iv.d). Clauses of kind (iv.e) superpose with clauses in (iv.e), (iv.g) and 
(iv.i) to generate clauses in (iv.e) and (iv.g). Clauses of kind (iv.f) superpose 
with clauses in (iv.f), (iv.h) and (iv.i) to generate clauses in (iv.f) and (iv.h). 
Clauses of kind (iv.g) superpose with clauses in (iv.g) and (iv.i) to generate 
clauses in (iv.i) and (iv.g). Clauses of kind (iv.h) superpose with clauses in 
(iv.h) and (iv.i) to generate clauses in (iv.i) and (iv.h). Clauses of kind (iv.i) 
superpose with clauses in (iv.i) to generate clauses in (iv.i). □ 

It follows that the limit is finite and a fair derivation is bound to halt: 

Lemma 12. A fair C-good SV)-- strategy is guaranteed to terminate when applied 
to CU S , where S is a set of ground flat C-literals. 

Theorem 3.5. A fair C-good SV^- strategy is an exponential satis fiahility pro- 
cedure for C. 

Proof: let m be the number of subterms occurring in the input set of literals. After 
fiattening the number of subterms is 0{m). The types of clauses listed in Lemma 11 
include literals of depth at most 2 (cf. (iv.a), (iv.h) and (iv.c)). Hence, at most 
h ^ 0{m^) distinct literals and at most 0(2'') clauses can be generated. It follows 
that the size of the set of clauses during the derivation is bound by a constant k 
which is 0(2'*). Since applying an inference takes polynomial time in k, the overall 
complexity is 0(2™'). D 
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Exponential complexity was expected, because it was shown already in [Nelson and 
Oppen 1980] that the satisfiability problem for L is NP-complete. 

3.5 The theory of arrays 

Let INDEX, ELEM and array be the sorts of indices, elements and arrays, respec- 
tively. The signature has two function symbols, select : array x index — > elem, 
and store : array x index x elem -^ array, with the usual meaning. The stan- 
dard presentation, denoted A., is made of two axioms, where a; is a variable of sort 
ARRAY, w and z are variables of sort index and u is a variable of sort elem: 

Vx,z,w. select(store(x, z, w), z) ~ u (21) 

\lx^z^w^v. (z9^ w D select(store(x, z,ii), w) ~ select(x, w)) (22) 

This theory also is not convex, because z ~ wVselect(store(x, z, w), w) ^ select(x, w)) 
is valid in the theory, but neither disjunct is. For the theory of arrays with exten- 
sionality, the presentation, named A'^ ^ includes also the extensionality axiom 

Vx, y. (Vz. select(x, z) ~ select(2/, z) Z) x ^ y) (23) 

where x and y are variables of sort array, and z is a variable of sort index. 

Definition 11. A set of ground A-literals is ^-reduced if it contains no literal 
Ig^r, where I and r are terms of sort ARRAY. 

Given a set of ground ^-literals S, A-reduction consists of replacing every literal 
l:^r G S, where I and r are terms of sort array, by select{l, ski^r) g^ selector, ski^r), 
where ski,r is a Skolem constant of sort index. The resulting ^-reduced form of S, 
denoted Redj\^{S), is related to the original problem by the following (cf. Lemma 
7.1 in [Armando et al. 2003]): 

Lemma 13. (Armando, Ranise and Rusinowitch 2003) Let S he a set of ground 
A-literals. A'^ U S is satisfiable if and only if AU Red^{S) is. 

Definition 12. A CSO >- is ^-good if (1) t )^ c for all ground compound terms 
t and constants c, and (2) a )~ e )~ j , for all constants a of sort ARRAY, e of sort 
ELEM and j of sort INDEX. 

If >~ is an RPO, it is sufficient to impose a precedence >, such that function symbols 
are greater than constant symbols, and a > e > j for all constants a of sort ARRAY, 
e of sort ELEM and j of sort index, for >- to be ^-good. If it is a KBO, the same 
precedence and a simple choice of weights will do. 

Lemma 14. All clauses in the limit Sac of the derivation Sq l^sv Si ... Si \^s-p . . . 
generated by a fair A-good SPy -strategy from Sq = AUS, where S is an A-reduced 
set of ground flat A-literals, belong to one of the following classes, where a, a' are 
constants of sort array, i,ii, . . . , in,ii, . . • , i'n, Jij • • • 7 jmj'i, ■ ■ ■ ,j'm ^''^ constants 
of sort INDEX (n,m > 0), e, e' are constants of sort ELEM, and ci,C2 are constants 
of either sort index or sort elem.- 

i) the empty clause; 
ii) the clauses in A: 

ii.a) select(store(x, z, w), z) 2± w and 

ACM Transactions on Computational Logic, Vol. V, No. N, Month 20YY. 



24 • A. Armando and M. P. Bonacina and S. Ranise and S. Schuiz 

ii.b) select(store(a:;, z, v),w) ~ select(a;, w) V z ~ w; 
Hi) ground flat unit clauses of the form: 

Hi. a) a "^ a' , 

iii.h) ci ~ C2, 

iii.c) ci 9^C2, 

iii.d) store(a, i, e) ~ a', 

iii.e) select(a, i) ~ e; and 
iv) non-unit clauses of the following form: 

iv.a) select(a, x) ~ select (a', x) Vx ~ ii V . . . Va; ~ i„ V ji ^ j( V . . . V j,n txi j'„ 
for X e INDEX, 

iv.b) select(a, i) ~ e V zi ixi z'^ V . . . V z„ ixi i^, 



TM? 



iv.c) e ~ e' V ii ^ i'l V . 


.Vin >^i'n, 


iv.d) e 9^ e' V ii >^i[V . . 


■yin >ii'n, 


iv.e) ii ~ z'^ V ^2 ixi *2 V • 


• • V i„ ixi i'^, 


iv.f) ii 9^ i'l V ^2 ix «2 V • 


• Vi„ ixi4, 



iv.g) t 2± a' V ii CXI i'j^ V . . . V «„ IX i'^, where t is either a or store(a, i, e). 

Proof: we recall that inequalities a'~ka' are not listed in (Hi), because S is A- 
reduced. Since >- is total on ground terms and ^-good, each clause in the above 
classes has a unique maximal literal, which is the first one in the above listing (up 
to a permutation of indices for (iv.e) and (iv.f)). Classes (iv.e) and (iv.f) are really 
one class separated in two classes based on the sign of the maximal literal. The 
proof is by induction on the sequence {Si}i. For the base case, input clauses are in 
(ii) or (Hi). For the inductive case, we have: 

— Inferences within (ii): The only inference that applies to the axioms in A is 
a superposition of (H.a) into (ii.b) that generates the trivial clause z 2± z V 
select(x, z) ~ w, which is eliminated by deletion. 

— Inferences within (iii): Inferences between ground flat unit clauses can produce 
only ground flat unit clauses in (iii) or the empty clause. 

— Inferences between a clause in (iii) and a clause in (ii): Superposition of (iii.d) 
store(a,i,e) 2± a' into (H.a) select(store(a;,z,u), z) ~ v yields select(a',i) ~ e 
which is in (iii.e). Superposition of (iii.d) into (ii.b) select(store(x,2:, w), w) ~ 
select(x, w)\/ z ^w yields select(a', w) ~ select(a, w) W i ^ w which is in (iv.a). 

— Inferences between a clause in (iv) and a clause in (ii): Superposition of (iv.g) 
store(a, i, e) ~ a' V ii ix: i'j^ V . . . V «„ ix i'^ into (H.a) yields select(a', z) ~ e V ii ix 
i'l \/ . . . V in IX i^, which is in (iv.b). Superposition of (iv.g) into (ii.b) yields 
select (a', w) ~ select (a, w) V i 2± w V ii ix i'j^ V . . . V i„ ix ij^, which is in (iv.a). No 
other inferences apply. 

— Inferences between a clause in (iv) and a clause in (iii): For an inference to apply 
to (iii. a) and (iv) it must be that a (or a') appears in a clause in (iv). Similarly, 
for an inference to apply to (iii.b) and (iv) it must be that ci (or C2) appears in 
a clause in (iv). In either case, simplification of the clause of class (iv) by the 
clause of class (iii) applies. Such a step can only generate a clause in (iv). The 
only inference that can apply to (iii.c) and (iv) is a paramodulation of a clause 
in (iv) into a clause in (iii.c). If Ci,C2 S elem, paramodulation of (iv.c) into 
(iii.c) generates a clause in (iv.d). If ci,C2 G index, paramodulation of (iv.e) 
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into (iii.c) produces a clause in (iv.f) or (iv.e), depending on the sign of the 
maximal literal in the resulting clause. We consider next (iii.d) and (iv). The 
only possible application of simplification consists of applying (Hi. d) to reduce a 
clause in (iv.g) to a clause in the same class. A superposition of (iv.c) or (iv.e) 
into (iii.d) generates a clause in (iv.g). No other inferences are possible. Last 
come (iii.e) and (iv). As a simplifier, (iii.e) may apply only to (iv.b) to yield a 
clause in (iv.c). All possible superpositions, namely superposition of (iii.e) and 
(iv.a), superposition of (iv.e) into (iii.e), and superposition of (iv.g) into (iii.e) 
give clauses of class (iv.b). 
— Inferences within (iv): Equational factoring applies only to a clause of class 
(iv.e) to yield a clause in (iv.e) or (iv.f). Reflection applies to a clause in (iv.d) 
or (iv.f), to yield a clause in one of (iii.b), (iii.c), (iv.e) or (iv.f). Then, for 
each kind of clause we consider all binary inferences it can have with clauses 
that follows in the list. We begin with (iv.a): superposition of (iv.a) and (iv.a) 
gives (iv.a); superposition of (iv.a) and (iv.b) gives (iv.b); superposition of (iv.g) 
into (iv.a) gives (iv.a). Second comes (iv.b): superposition of (iv.b) and (iv.b) 
gives (iv.c); superposition of (iv.e) into (iv.b) gives (iv.b); superposition of (iv.g) 
into (iv.b) gives (iv.b). Next there is (iv.c): superposition of (iv.c) and (iv.c) 
gives (iv.c); paramodulation of (iv.c) into (iv.d) gives (iv.d); superposition of 
(iv.c) into (iv.g) gives (iv.g). For (iv.e) and (iv.f), we have: superposition of 
(iv.e) and (iv.e) gives (iv.e) or (iv.f); paramodulation of (iv.e) into (iv.f) gives 
(iv.e) or (iv.f); superposition of (iv.e) into (iv.g) gives (iv.g). Last, all possible 
applications of superposition within (iv.g) give (iv.g). □ 

Thus, we have (cf. Lemma 7.3 and Theorem 7.2 in [Armando et al. 2003]): 

Lemma 15. (Armando, Ranise and Rusinowitch 2003) A fair A- good SVy- 
strategy is guaranteed to terminate when applied to AUS, where S is an A-reduced 
set of ground flat A-literals. 

Theorem 3.6. (Armando, Ranise and Rusinowitch 2003) A fair A- good SVy- 
strategy is an exponential satisfiability procedure for A and A'' . 

4. REWRITE-BASED SATISFIABILITY: COMBINATION OF THEORIES 

A big-engines approach is especially well-suited for the combination of theories, 
because it makes it possible to combine presentations rather than algorithms. The 
inference engine is the same for all theories considered, and studying a combination 
of theories amounts to studying the behavior of the inference engine on a problem 
in the combination. In a little-engines approach, on the other hand, there is in 
principle a different engine for each theory, and studying a combination of theories 
may require studying the interactions among different inference engines. 

In the rewrite-based methodology, the combination problem is the problem of 
showing that an iST'^ -strategy decides T-satisfiability, where T = IJ"^]^ %, know- 
ing that it decides 7i-satisfiability for all i, 1 < i < n. Since 7i-reduction applies 
separately for each theory, and flattening is harmless, one only has to prove ter- 
mination. The main theorem in this section establishes sufficient conditions for 
SVy to terminate on T-satisfiability problems if it terminates on 7i-satisfiability 
problems for alH, 1 < i < n. A first condition is that the ordering >~ be T-good: 
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Definition 13. Let Ti,...,Tn be presentations of theories. A CSO >- is T- 
good, where T — IJj^]^ %, if it is 7i-good for all i, 1 < i < n. 

The second condition will serve the purpose of excluding paramodulations from 
variables, when considering inferences across theories. This is key, since a variable 
may paramodulate into any proper non- variable subterm: 

Definition 14. A clause C is variable- inactive for >- if no maximal literal in 
C is an equation t c^ x where x ^ Var{t). A set of clauses is variable-inactive for 
>~ if all its clauses are. 

Definition 15. A theory presentation T is vaTiahle-inactive for SVy ifthelimit 
Soo of any fair SVy -derivation from So ~ T U S is variable-inactive for )^ . 

For satisfiability problems, S is ground, hence immaterial for variable-inactivity. 
If axioms persist, as generally expected, T C Soo, and Definition 15 requires that 
they are variable-inactive. If they do not persist, they are irrelevant, because a fair 
strategy does not need to perform inferences from clauses that do not persist. 

The third condition is that the signatures do not share function symbols, which 
excludes paramodulations from compound terms. Sharing of constant symbols, 
including those introduced by flattening, is allowed. Thus, the only inferences 
across theories are paramodulations from constants into constants, that are finitely 
many: 

Theorem 4.1. Let 7^, . . . , 7^j be presentations of theories, with no shared func- 
tion symbol, and let T = Ur=i -^^ Assume that for all i, I < i < n. Si is a 
%-reduced set of ground flat %-literals. If for all i, 1 < i < n, a fair Ti-good SVy- 
strategy is guaranteed to terminate on TlU Si, and % is variable-inactive for SVy, 
then a fair T-good SVy -strategy is guaranteed to terminate on T U Si (J . . . U Sn- 

Proof: let S*^ be the set of persistent clauses generated by iS'P^ from %\J Si. 
Since SV)- terminates on 7i U Si, for all i, 1 < i < n, we are concerned only 
with binary expansion inferences between a clause in Sl^^ and a clause in S^ , with 
^ '^ i ^ 3 ^ n. We consider first paramodulations from variables. Assume that 
a literal i 2± x occurs in a clause C in 5^. If a; G Var{t), it is i ;^ x by the 
subterm property of the CSO, and therefore, there is no paramodulation from x. 
If a; ^ Var{t), t ~ a; is not maximal in C, because Sl^ is variable-inactive by 
hypothesis. In other words, there is another literal L in C such that L )^ t "^ x. By 
stability of )^, La ;^ (t ~ x)a for all substitutions a. Thus, no instance (t ~ x)a 
can be maximal, so that, again, there is no paramodulation from x. Therefore, 
there are no paramodulations from variables. Since there are no shared function 
symbols, no paramodulation from a compound term applies to a clause in S*^ and 
a clause in 5^ . The only possible inferences are those where a clause a '::i t\l C 
paramodulates into a clause l\a\ txiuW D, where a is a constant, t is also a constant 
(it cannot be a variable, because a ~ i V C is variable-inactive, and it cannot be a 
compound term, because >- is stable and good), the context I may be empty, the 
mgu is empty, and C and D are disjunctions of literals. Since there are only finitely 
many constants, only finitely many such steps may apply. D 

Corollary 3. Let 7i, . . . , 7^ be presentations of theories, with no shared func- 
tion symbol, and let T = U"=i ^ ■ Ufor all i, 1 < i < n, a fair % -good SVy -strategy 
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is a satisfiability procedure for %, and % is variable-inactive for SVy, then a fair 
T-good SV)-- strategy is a satisfiability procedure for T . 

The requirement of being variable-inactive is rather natural for equational theo- 
ries: 

Theorem 4.2. If T is a presentation of an equational theory with no trivial 
models, then T is variable-inactive for SVy for any CSO )^ . 

Proof: by way of contradiction, assume that T is not variable-inactive, that is, for 
some variable inactive 5*0, S'oo is not variable inactive. Thus, there is an equation 
t ~ a; £ S'oo such that x ^ Var{t). Since SV is sound, Sq |= t ~ a;. An equation 
t ~ a; such that x ^ Var{t) is satisfied only by a trivial model. Thus, Sq has only 
trivial models. Since T C 5*0, a model of Sq is also a model of T. It follows that T 
has trivial models, contrary to the hypothesis. □ 

Given an equational presentation T, the addition of the axiom 3x3y x'~ky is suffi- 
cient to exclude trivial models. Since the clausal form of this axiom is the ground 
flat literal ski^sk2, where ski and sfe are two Skolem constants, this addition 
preserves all termination results for SP-^^ on T-satisfiability problems. 

For Horn theories, refutational completeness is preserved if SVy is specialized to 
a maximal unit strategy, that restricts superposition to unit clauses and paramod- 
ulates unit clauses into maximal negative literals [Dershowitz 1991]. Equational 
factoring is not needed for Horn theories. This strategy resembles positive unit res- 
olution in the non-equational case and has the same character of a purely forward- 
reasoning strategy. At the limit, all proofs in S'oo are valley proofs, that is, equa- 
tional rewrite proofs in the form u —>* o <— * t. It follows that all non-unit clauses 
are redundant in Soo: 

Theorem 4.3. If T is a presentation of a Horn equational theory with no trivial 
models, then T is variable-inactive for SV^ for any CSO >- and the maximal unit 
strategy. 

Proof: it is the same as for Theorem 4.2, because Soo only contains unit clauses. □ 

For first-order theories, the requirement that Soo be variable- inactive excludes the 
generation of clauses in the form ai ~ a; V . . . V a„ ~ a;, where for alH, 1 < i < n, Oi 
is a constant. Such a disjunction may be generated, but only within a clause that 
contains at least one greater literal, such as one involving function symbols (e.g., 
clauses of type (iv.a) in Lemma 14). 

Theorem 4.4. Let T be a presentation of a first-order theory: if ai 2i a; V ... V 
On — X G Soo, where Soo is the limit of any fair SVy -derivation from So — T U S, 
for any CSO )^ , then ThT is not stably infinite. Furthermore, if T has no trivial 
models, Th T is also not convex. 

Proof: since SV is sound, ai ~a;V...Va„ ~a;G Soo implies So |= Va; ai ~ 
a; V . . . V a„ ~ a;. It follows that So has no infinite model. On the other hand, 
ai~a;V...Va„~a:G Soo implies that So is satisfiable, because if So were 
unsatisfiable, by the refutational completeness of SV , Soo would contain only the 
empty clause. Thus, So has models, but has no infinite model. Equivalently, S 
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has T-models, but has no infinite T-modeL This means that Th T is not stably 
infinite, and, if it has no trivial models, it is also not convex by Theorem 2.1. D 

In other words, if T is not variable-inactive for SVy, because it generates a clause 
in the form ai ~ a; V . . . V a„ ~ a;, then T is not stably infinite either. 

The notion of a clause in the form oi ~ a; V ... V a^ — x was "lifted" in [Bonacina 
et al. 2006] to those of variable clause and cardinality constraint clause. A variable 
clause is a clause containing only equations between variables or their negations. 
The antecedent-mgv}'^ {a-mgu, for short) of a variable clause C is the most general 
unifier of the unification problem {x —■ y: xc/^y E C}. Then, a variable clause C is 
a cardinality constraint clause, if C^ fi is not empty and contains no trivial equation 
a; ~ x, where /i is the a-mgu of C and C+ is made of the positive literals in C. This 
notion allows one to prove the following (cf. Lemma 5.2 in [Bonacina ct al. 2006]): 

Lemma 16. (Bonacina, Ghilardi, Nicolini, Ranise and Zucchelli 2006) If Sq is 
a finite satisfiable set of clauses, then Sq admits no infinite models if and only if 
the limit S^o of any fair SVy -derivation from Sq contains a cardinality constraint 
clause. 

Next, we note that a cardinality constraint clause cannot be variable-inactive, 
because it must have some positive literal in the form x ~ y that is maximal. For 
example, in z 9^ y V x ~ j/ V z ~ w, all three literals are maximal. Thus, it follows 
that: 

Theorem 4.5. // a first-order theory T is variable-inactive for SVy, then it is 
.stably-infinite. 

Proof: assume that T is not stably-infinite. Then there exists a quantifier-free T- 
formula (p, that has a T-model but no infinite T-model. Let Sq be the clausal form 
of TU {(/?}: 5*0 is finite, satisfiable and admits no infinite model. By Lemma 16, the 
limit 5*00 of a fair tST'^ -derivation from So contains a cardinality constraint clause. 
Thus, 5*00, and hence T, is not variable- inactive for SVy. □ 

We conclude by applying Theorem 4.1 to any combination of the theories studied 
in Section 3. The goodness requirement (Definition 13) is easily satisfied: any CSO 
is good for £, and it is simple to obtain an ordering that is simultaneously 7?.- 
good, X-good, £-good and ^-good. The reductions of TZ'' to TZ (Lemma 1) and 
A"^ to A (Lemma 13) apply also when the signature contains free function symbols 
/: So, ... , Sm-i -^ Sm, fn > 1, provided that none of the Si, 1 < i < m, is REG or 
ARRAY, respectively. A function symbol satisfying this requirement is said to be 
record-safe or array-safe, respectively. Thus, we have: 

Theorem 4.6. A fair SVy -strategy is a satisfiability procedure for any combina- 
tion of the theories of records, with or without extensionality, integer offsets, integer 
offsets modulo, possibly empty lists, arrays, with or without extensionality, and the 
quantifier-free theory of equality, provided (1) >~ is TZ-good whenever records are 
included, (2) >~ is T-good whenever integer offsets are included, (3) >~ is C-good 
whenever lists are included and (4) >~ is A-good whenever arrays are included, and 



-'^''The name derives from the sequent-style notation for clauses adopted in [Bonacina ct al. 2006]. 
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(5) all free function symbols are array-safe (record- safe) whenever arrays (records) 
with extensionality and the quantifier-free theory of equality are included. 

Proof: for the quantifier- free theory of cquahty, £ is vacuously variable-inactive for 
SVy. For the other theories, the lists of clauses in Lemma 2, Lemma 7, Lemma 9, 
Lemma 11 and Lemma 14, show that TZ, 2, Tki ^ and A, respectively, are variable- 
inactive for SVy. Therefore, the result follows from Theorem 4.1. D 

This theorem holds if C is replaced by Csh or ^no, since they are also variable- 
inactive for SVy (cf. Lemmata 4.1 and 5.1 in [Armando et al. 2003]). 

5. SYNTHETIC BENCHMARKS 

This section presents six sets of synthetic benchmarks: three in the theory of arrays 
with extensionality (STDRECDMM, SWAP and STOREINV); one in the combination of the 
theories of arrays and integer offsets (lOS); one in the combination of the theories 
of arrays, records and integer offsets to model queues (QUEUE); and one in the 
combination of the theories of arrays, records and integer offsets modulo to model 
circular queues (CIRCULAR_QUEUE). Each problem set is parametric, that is, it is 
formulated as a function Pb, that takes a positive integer n as parameter, and 
returns a set of formulae Pb{n). For all these problems, the size of Pb{n) grows 
monotonically with n. This property makes them ideal to evaluate empirically how 
a system's performance scale with input's size, as we shall do in Section 6. 

5.1 First benchmark: STORECOMM(n) and STORECDMM_INVALID(n) 

The problems of the STORECOMM family express the fact that the result of storing a 
set of elements in different positions within an array is not affected by the relative 
order of the store operations. For instance, for n = 2 the following valid formula 
belongs to STORECOMM(n): 

ii 9^ Z2 ^ store{store{a, ii, ei), i2, £2) — store{store(a, ^2, 62), ii, ei). 

Here and in the following a is a constant of sort array, ii, . . . , i„ are constants of 
sort INDEX, and ei, . . . , e„ are constants of sort elem. 

In general, let n > andp, qhe permutations of {1, . . . ,n}. Let STDRECDMM(n,p, q) 
be the formula: 

/\ ii^i„,D{T,,{p)c^Tniq)) 
where CJ is the set of 2-combinations over {1, . . . ,n} and 

\ store(Tfe_i(p),ip(fe),ep(fc)) if 1 < fc < n. 

Since only the relative position of the elements of p with respect to those of q is 
relevant, q can be fixed. For simplicity, let it be the identity permutation l. Then 
STORECOMM(n) = {STDRECDMM(n,p, t) : p is a permutation of {1, . . . , n}}. 

Example 3. Ifn~'i andp is such that p{l) = 3, p(2) = I, and p{3) ~ 2, then 

Tn{p) = store{store{store{a,i:>,,e:i),ii,ei),i2,e2) 
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r„(t) = store{store{store{a,ii,ei),i2,e2),i3,e3)) 
and STDRECDMM(n,p,i) is 

((ii9^i2 A 12 9^^3 A 119^^3) D 

store{store(store{a, i^, 63), ii, ei), i2, £2) — store{store(store{a, ii, ei), 12, 62), ^3, 63)). 

Each element of STORECOMM(n), once negated, reduced to clausal form, ^-reduced 
and flattened, leads to a problem whose number of clauses is in O(n^), because it 
is dominated by the (2) = 2 clauses in {«; '^im '■ {It'tti) £ CJ}- 

A slight change in the definition of STORECOMM generates sets of formula; that are 
not valid in A: 

STORECOMM_INVALID(n) = {STORECOMM(n,p, l) : p is a. permutation of {1, ... , n}} 

where l' : {1, . . . , n} -^ {1, . . . , n} is such that L'{k) = k for 1 < k < n — 1 and 
i'{n) = 1. 

Example 4. For n, p and T„(p) as in Ex. 3, 

Tn{i) = store(store(store(a, ii,ei),i2, 62), zi, ei)) 

ond STORECOMM_INVALID(n,p, t') is 

{{ii^i2 A«2 9^«3 Aii9^«3) D 

store{store{store{a, i^, 63), ii, ei), ^2, £2) — store{store{store{a, ii, ei), 12, £2), *i, ei)). 

5.2 Second benchmark: SWAP(n) and SWAP_INVALID(n) 

An elementary property of arrays is that swapping an element at position ii with an 
element at position 12 is equivalent to swapping the element at position 12 with the 
element at position ii. The problems of the SWAP family are based on generalizing 
this observation to any number n of swap operations. For instance, for n = 2 the 
following valid fact is in SWAP(n): 

swap{swap{a,iQ, ii),i2, *i) — swap{swap{a, ii,io), ii,i2) 

where swap{a, i,j) abbreviates the term store{store{a, i, select{a, j)), j, select{a, i)). 
In general, let ci,C2 be subsets of {l,...,n}, and let p,q be functions p,q : 
{1, . . . , n} — > {1, . . . , n}. Then, we define SWAP(n, ci, C2,p, q) to be the formula: 

Tnici,p,q) ~Tn{c2,p,q) 

with Tfe (c, p, q) defined by 

(a if fc = 0, 

Tk{c,p,q) = i^ swap{Tk-i{c,p,q),ip(^k),'iq{k)) if 1 < fc < n and fc G c, and (25) 
[ swap{Tk-i{c,p,q),iq(k)-,ip(k)) lil <k <n and fc ^ c. 

Tn{c,p, q) is the array obtained by swapping the elements of position p{k) and q{k) 
of the array a for I < k < n. The role of the subset c is to determine whether the 
element at position p{k) has to be swapped with that at position q{k) or vice versa, 
and has the effect of shuffling the indices within the formula. 
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Example 5. If n ^ 3, ci — {1}, C2 = {2, 3}, p and q are such that p{k) = k and 
q{k) = 2, for all k, 1 < k < n, then 

T„(ci,p, g) == swap{swap{swap{a,ii,i2),i2,i2),i2,i3) 

Tn{c2,p, q) = swap{swap{swap{a, 12, h), «2, «2), h, 12)- 

Thus, SWAP(n) — {SWAP(n, Ci, C2,p, q) : Ci, C2 C {1, . . . , n} and p, q : {1, . . . , n} -^ 
{1, . . . , n}}. Each formula, once negated, transformed into clausal form, ^-reduced 
and flattened, leads to a problem with 0{n) clauses. 

A small change in the definition produces a class SWAP .INVALID. With ci, C2,p, 9 
defined as for SWAP, let SWAP_INVALID(n,ci, C2,_p, g) be the formula: 

Tn{ci,P,q) ~T„(c2,p,g') 

where Tk{c,p,q) is as in (25), q' : {l,...,n} -^ {l,...,n} is such that q'{l) = 
(g(l) + 1) mod n, and q'{k) = q{k) for all fc, 2 < fc < n. Then, SWAP_INVALID(n) = 
{SWAP.INVALID(n,ci,C2,p,g) :ci,C2 C {1, . . . , n},p, g : {1, . . . , n} ^ {1, . . . , n}}. 

Example 6. Ifn, ci, 02, p and q are as in Example 5, 

T„(ci,p, q) = swap{swap{swap{a, ii, ^2), »2, ^2), »2, ^3) 

Tn{c2,p,q') = swap{swap{swap{a, {3, ii),i2, 12), 13,12)- 

5.3 Third benchmark: STDREINV(n) and STOREINV_INVALID(n) 

The problems of the STOREINV family capture the following property: if the arrays 
resulting from swapping elements of array a with the elements of array b occurring 
in the same positions are equal, then a and b must have been equal to begin with. 
For the simple case where a single position is involved, we have: 

store{a, i, select{b, i)) ~ store{b, i, select(a, i)) Z) a c:i b. 

For n > 0, let STOREINV(n) = {multiswap(a, 5, n) D a ~ 6}, where 

{(a ~ 6) if fc == 0, 
let (a' ~ b') - muhiswap(a, b,k-l) in 
store{a' ,ik,3elect{b' ,ik)) ~ store(b' ,ik, select{a' ,ik)) 
if fc > 1. 

(26) 

Example 7. For n = 2 we have 

store{a' , Z2, select(b' , 12)) — store{b' , Z2, select{a' , 12)) Z) a :^ b 

where a' — store{a, ii, select(b, ii)) and b' — store{b, ii, select{a, ii)). 

Transformation into clausal form of the negation of the formula in STOREINV(n), 
followed by ^-reduction and flattening, yields a problem with 0{n) clauses. 

For STOREINV_INVALID, let store{ta, in, select{tb, in)) — store{th, in-, select{ta, in)) 
be the formula returned by multiswap(a, b, n) for n > 2. Then, we define 

STOREINV_INVALID(n) = 
{store{ta, ii, select{ti,, in)) — store{ti,, in, selectita, in)) D a ~ &}. 
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5.4 Fourth benchmark: lOS(n) 

The problems of the lOS(n) family combine the theories of arrays and integer offsets. 
Consider the following two program fragments: 

for(k=l;k<=n;k++) f or (k=l ;k<=n;k++) 

a [i+k] =a [i] +k ; a [i+n-k] =a [i+n] -k ; 

If the execution of either fragment produces the same result in the array a, then 
a[i+n] ==a[i] +n must hold initially for any value of i, k, a, and n. 

Example 8. For n — 2, IDS(n) includes only the following valid formula: 

store{store{a, i + 1, select(a, i) + 1), z + 2, select{a, i) + 2) ~ 
store{store{a, i + 1, select{a, i + 2) — 1), i, select{a, i + 2) — 2) 

Z) select{a, z + 2) 2± select{a, i) + 2. 

In general, for n > let lOS(n) = {i^ ~ K^ D select{a, i + n) ^ select{a, i) + n} 
where 

-^0 = -^0 = « 



U^ — store{L^_-^^, i + k, select{a, i) + k) for fc = 1, 



,n 



BJ^ = store{R2_i, i + n — k, select{a, i + n) — k) for fc = 1, . . . , n. 

Each formula in lOS(n), once negated, reduced to clausal form, flattened and I- 
reduced, generates 0{n) clauses, ^-reduction is not needed, since the negation of 
the formula does not contain inequalities of sort ARRAY. 

5.5 Fifth benchmark: QUEUE(n) 

The theories of arrays, records and integer offsets can be combined to specify queues, 
defined as usual in terms of the functions enqueue : elem x queue -^ queue, 
dequeue : QUEUE -^ queue, first : queue — > elem, last : queue — > elem and 
reset : queue -^ queue, where queue and elem are the sorts of queues and their 
elements, respectively. 

Indeed, a queue can be implemented as a record with three fields: items is an 
array storing the elements of the queue, head is the index of the first element of 
the queue in the array, and tail is the index where the next element will be in- 
serted in the queue. Following Section 3.1, the signature features function symbols 
rstorcitems, rstore/iead, rstoretai;, rselectitems, rselect/jead and rselecttai;, abbrevi- 
ated as rstorej, rstore/i, rstore^, rselectj, rselect^, rselect^, respectively. Then, the 
above mentioned functions on queue are defined as follows: 

enqueue(u, x) = rstorCf ^storei(x, store(rselecti(x), rselectt(x), v)), s(rselectt(x))) 

dequeue(x) = rstore/i(x, s(rselect/i(x))) 

first (x) = select (rselecti(x),rselect;i(x)) 

last(x) — select(rselecti(x),p(rselectt(x))) 

reset(x) — rstore;i(a;,rselecti(x)) 

where x and v are variables of sort queue and elem, respectively, store and select 
are the function symbols from the signature of A, p and s are the function symbols 
for predecessor and successor from the signature of I. 
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A basic property of queues is the following: assume that qq is a properly ini- 
tialized queue and q is obtained from qq by performing n + 1 enqueue opera- 
tions (n > 0), that insert n -I- 1 elements eo, ei, . . . , e„, and m dequeue operations 
(0 < m < n), that remove m elements eo, ei, . . . , e-m-i', then first((7) = Cm- Dequeue 
operations can be interleaved with enqueue operations in any order, provided the 
number of dequeue operations is always strictly smaller than the number of pre- 
ceding enqueue operations. For instance, if q — enqueue(e2,dequeue(enqueue(ei, 
enqueue(eo, reset((7o))))), then first((7) = ei. Problems in the QUEUE(n) family ex- 
press an instance of this property, where the dequeue operator is applied once every 
3 applications of the enqueue operator. Thus, the number of dequeue operations 
will be TO = [(n + 1)/3J . 

Given a term t where the function symbols reset, enqueue, dequeue, first, last 
and reset may occur, let t| denote the term obtained from t by unfolding the above 
function definitions. Then QUEUE(n), for n > 0, is defined as follows: 

n-l 

QUEUE(n) = {{qo ^ reset(g)| A /\ qi+i ~ fi+i{ei,qi)) D first(g„)4 ~ e™}, 

i=0 

where 



J dequeue(enqueue(e, g))J, if i mod 3 = 0, 
1^ enqueue(e, q)l otherwise. 



.h(e,q) 

and m = [{n + 1)/3J . 

Example 9. Ifn~l, then QUEUE(n) is the formula: 

go ^ rstore,,(g, rselectt(g)) A 

qi ~ rstoret( rstorei(go) store(rselecti(go), rselectt((7o), eo)), s(rselectt((7o))) 
select (rselecti((7i),rselect^(gi)) ~ eo. 

Each formula in QUEUE(n), once negated, reduced to clausal form, flattened and 
X-reduced, generates 0{n) clauses, ^-reduction and 7?.-reduction are not needed, 
since the negation of the formula does not contain inequalities of sort array or 

REC. 

5.6 Sixth benchmark: CIRCULAR_qUEUE(n, fc) 

It is sufficient to replace the theory of integer offsets I hy Ik, to work with indices 
modulo k and extend the approach of the previous section to model circular queues 
of length k. The problems of the CIRCULAR_QUEUE(n, fc) family say that if qn+i is 
obtained from a properly initialized circular queue qo by inserting n + 1 elements 
eo,ei, . . . ,e„, for n > 0, and n mod fc = 0, then first(g„-i-i) — last((7„+i) holds, 
because the last element inserted overwrites the one in the first position (e.g., 
picture inserting 4 elements in a circular queue of length 3). This is formally 
expressed by 

CIRCULAR_qUEUE(n, fc) = 

n 

{{qo c:i reset((7)| A /\ qt+i ~ enqueue(ei, g^)!) D first((7„+i)|~ last(g„+i)i}, 

i=0 

for n > such that n mod fc = 0. 
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Example 10. If k = n = 1, then CIRCULAR_QUEUE(1, 1) is the formula: 

go — rstore/i(g,rsclectt(g)) A 

qi ~ rstorct(rstorei((7o,store(rselecti(go),i'selecti(go),eo)),s(rselectt((7o))) A 

(72 — rstoret (rstorei (qi , store(rselecti (qi), rselectt (qi ) , ei ) ) , s (rselectt (gi ) ) ) 

D select (rselecti(g2),rselect/i (52)) — select (rselectt (52), p(rselectt (52)))- 

Each formula in CIRCULAR_QUEUE(n, k), once negated, reduced to clausal form, flat- 
tened and I-reduced, generates 0{n) clauses. 

6. EXPERIMENTS 

The synthetic benchmarks of Section 5 were submitted to three systems: E 0.82, 
CVC 1.0a and CVC Lite 1.1.0. The prover E implements (a variant of) SP with a 
large choice of fair search plans, based on the "given-clause" algorithm [Schuiz 2002; 
2004], that ensure that the empty clause will be found, if the input is unsatisfiable, 
and a finite satisfiable saturated set will be generated, if the input set is satisfiable 
and admits one. CVC [Stump et al. 2002] and CVC Lite [Barrett and Berezin 2004] 
combine several theory decision procedures following the Nelson-Oppcn method, 
including that of [Stump et al. 2001] for arrays with extensionality, and integrate 
them with a SAT engine [Barrett et al. 2002a]. CVC is no longer supported; it 
was superseded by CVC Lite, a more modular and programmable system. While 
CVC Lite has many advantages, at the time of these experiments CVC was reported 
to be still faster on many problems. CVC and CVC Lite feature a choice of SAT 
solvers: a built-in solver or Chaff [Moskewicz et al. 2001] for CVC, a "fast" or a 
"simple" solver for CVC Lite. In our experiments, CVC and CVC Lite performed 
consistently better with their built-in and "fast" solver, respectively, and therefore 
only those results are reported. 

We wrote a generator of pseudo-random instances of the synthetic benchmarks, 
producing either TPTP^^ or CVC syntax, and a set of scripts to run the solvers 
on all benchmarks. The generator creates either T- reduced, flattened input files 
or plain input files. Flattening times were not included in the reported run times, 
because flattening is a one-time linear time operation, and the time spent on flat- 
tening was insignificant. In the following, native input means flattened, T-reduced 
files for E, and plain, unflattened files for CVC and CVC Lite. 

The experiments were performed on a 3.00GHz Pentium 4 PC with 512MB RAM. 
Time and memory were limited to 150 sec and 256 MB per instance. If a system ran 
out of cither time or memory under these limits, the result was recorded as a "fail- 
ure." When Pb{n) is not a singleton (cf. STDRECDMM(n), STORECOMM_INVALID(n), 
SWAP(n) and SWAP_INVALID(n)), the median run time over all tested instances is 
reported. ^^ For the purpose of computing the median, a failure is considered to be 
larger than all successful run times. The median was chosen in place of the average, 
precisely because it is well-defined even in cases where a system fails on some, but 
not all instances of a given size, a situation that occurred for all systems. 



lirpprpp^ or "Thousands of Problems for Theorem Provers" is a de facto standard for testing 
general-purpose first-order theorem provers: see http://www.tptp.org/. 

^^Reported figures refer to runs with 9 instances for every value of n. Different numbers of 
instances (e.g., 5, 20) were also tried, but the impact on the plots was negligible. 
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The results for E refer to two variants of a simple strategy, termed E(good-lpo) 
and E(std-kbo), for reasons that will be clear shortly. This strategy adopts a single 
priority queue for clause selection, where E(good-lpo) gives the same priority to 
all clauses, while E(std-kho) privileges ground clauses. Additionally, E(good-lpo) 
ensures that all input clauses are selected before the generated ones, whereas E(std- 
kbo) does not. Both variants employ a very simple clause evaluation heuristic to 
rank clauses of equal priority; it weights clauses by counting symbols, giving weight 
2 to function and predicate symbols and weight 1 to variable symbols. Since these 
are the default term weights that E uses for a variety of operations, they are pre- 
computed and cached, so that this heuristic is very fast, compared to more complex 
schemes. 

E(std-kho) features a Knuth-Bcndix ordering (KBO), where the weight of symbols 
is given by their arity, and the precedence sorts symbols by arity first, and by inverse 
input frequency second (that is, rarer symbols are greater), with ties broken by order 
of appearance in the input. This KBO is 7?,-good, T-good, and it satisfies Condition 
(1), but not Condition (2), in Definition 12, so that it is not ^-good. It was included 
because it is a typical ordering for first-order theorem proving, and therefore E(std- 
kbo) can be considered representative of the behavior of a plain, standard, theorem- 
proving strategy. E(good-lpo) has a lexicographic (recursive) path ordering (LPO), 
whose precedence extends that of E(std-kbo), in such a way that constants are 
ordered by sort. Thus, also Condition (2) in Definition 12 is satisfied and the 
resulting LPO is 7^-good, I-good and ^-good. In both precedences, constants 
introduced by flattening are smaller than those in the original signature. It is 
worth emphasizing that contemporary provers, such as E, can generate precedences 
and weighting schemes automatically. The only human intervention was a minor 
modification in the code to enable the prover to recognize the sort of constants and 
satisfy Condition (2) in Definition 12. 

6.1 Experiments with STORECOMM and STDRECOMM_INVALID 

Many problems involve distinct objects, that is, constants which name elements 
that arc known to be distinct in all models of the theory. E features a complete 
variant of SP that builds knowledge of the existence of distinct objects into the 
inference rules [Schulz and Bonacina 2005]. Under this refinement, the prover 
treats strings in double quotes and positive integers as distinct objects. This aspect 
is relevant to the STORECOMM problems, because they include the inequalities in 
{ii^ijn '■ {l,rn) S C^}, stating that all indices are distinct. Thus, E was applied 
to these problems in two ways: with {i;9^im : {l,m) e C^} included in the input 
(axiomatized indices), and with array indices in double quotes {built-in index type). 
Figure 3 shows that all systems solved the problems comfortably and scaled 
smoothly. On valid instances, E(good-lpo) with axiomatized indices and CVC Lite 
show nearly the same performance, with E apparently slightly ahead in the limit. 
E(good-lpo) with built-in indices outperformed CVC Lite by a factor of about 2.5. 
CVC performed best improving by another factor of 2. It is somewhat surprising 
that E, a theorem prover optimized for showing unsatisfiability, performed com- 
paratively even better on invalid (that is, satisfiable) instances, where it was faster 
than CVC Lite, and E(good-lpo) with built-in indices came closer to CVC. The 
shared characteristics of all the plots strongly suggest that for STORECOMM the most 
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Fig. 3. Performance on valid (left) and invalid (right) STORECOMM instances with native 
input. 

important feature is sheer processing speed. Although there is no deep reasoning or 
search involved, the general-purpose prover can hold its own against the specialized 
solvers, and even edge out CVC Lite. 
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Fig. 4. Performance on valid (left) and invalid (right) STORECOMM instances with flat input 
for all. 



When all systems ran on flattened input (Figure 4), both CVC and CVC Lite 
exhibited run times approximately two times higher than with native format, and 
CVC Lite turned out to be the slowest system. CVC and E with built-in indices 
were the fastest: on valid instances, their performances are so close, that the plots 
coincide, but E is faster on invalid instances. It is not universally true that flat- 
tening hurt CVC and CVC Lite: on the SWAP problems CVC Lite performed much 
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better on flattened input. This suggests that speciahzed decision procedures are 
not insensitive to input format. 

Although CVC was overall the fastest system on STORECOMM, E was faster than 
CVC Lite, and did better than CVC on invalid instances when they were given the 
same input. As CVC may be considered a paradigmatic representative of optimized 
systems with built-in theories, it is remarkable that the general-purpose prover 
could match CVC and outperform CVC Lite. 

6.2 Experiments with SWAP and SWAP_INVALID 
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Fig. 5. Performance on valid (left) and invalid (right) SWAP instances, native input. 

Rather mixed results arose for SWAP, as shown in Figure. 5. Up to instance size 
5, the systems are very close. Beyond this point, on valid instances, E leads up to 
size 7, but then is overtaken by CVC and CVC Lite. E could solve instances of size 

8, but was much slower than CVC and CVC Lite, which solved instances up to size 

9. No system could solve instances of size 10. For invalid instances, E solved easily 
instances up to size 10 in less than 0.5 sec. CVC and CVC Lite were much slower 
there, taking 2 sec and 4 sec, respectively. Their asymptotic behaviour seems to be 
clearly worse. 

Consider the lemma 

store{store(x, z, select(x, w)), w, select(x, z)) ~ 

store(store{x, w, select{x, z)), z, select{x, w)) 

that expresses "commutativity" of store. Figure 6 displays the systems' perfor- 
mance on valid instances, when the input for E includes this lemma. Although 
this addition means that the theorem prover is no longer a decision procedure, ^"^ E 



^^Lemma 14 and therefore Theorem 3.6 do not hold, if this lemma is added to presentation y4. 
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Fig. 6. Performance on valid SWAP instances with added lemma for E. 

terminated also on instances of size 9 and 10, and its plot suggests a better asymp- 
totic behavior. While no system emerged as a clear winner, this experiment shows 
how a prover that takes a theory presentation in input offers an additional degree 
of freedom, because useful lemmata may be added. 

6.3 Experiments with STOREINV and STOREINV_INVALID 
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Fig. 7. Performance on valid (left) and invalid (right) STOREINV instances with native 
input. 



The comparison becomes even more favorable for the prover on the STOREINV 
problems, reported in Figure 7. CVC solved valid instances up to size 8 within the 
given resource limit. CVC Lite went up to size 9, but E solved instances of size 

ACM Transactions on Computational Logic, Vol. V, No. N, Month 20YY. 



New results on rewrite-based satisfiability procedures 



39 



10, the largest generated. A coraparison of absolute run times at size 8, the largest 
solved by all systems, gives 3.4 sec for E, 11 sec for CVC Lite, and 70 sec for CVC. 
Furthermore, E(std-kho) (not shown in the figure) solved valid instances in nearly 
constant time, taking less than 0.3 sec for the hardest problem. Altogether, E with 
a suitable ordering was clearly qualitatively superior than the dedicated systems. 
For invalid instances, E did not do as well, but the run times there were minimal, 
with the largest run time for instances of size 10 only about 0.1 sec. 

6.4 Experiments with lOS 
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Fig. 8. Performance on the IDS instances: since in the graph on the left the curve for 
CVC is barely visible, the graph on the right shows a rescaled version of the same data, 
including only the three fastest systems. 

The lOS problems were encoded for CVC and CVC Lite by using their built-in 
linear arithmetic, on the reals for CVC and on the integers for CVC Lite. We 
tried to use inductive types in CVC, but it performed badly and even reported 
incorrect results.^'* In terms of performance (Figure 8, left), CVC was clearly the 
best system, as expected from a tool with built-in arithmetic. E(good-lpo) was no 
match, although it still solved all tried instances (Figure 8, left). On the other hand, 
E(std-kho) proved to be competitive with the systems with built-in arithmetic. At 
least two reasons explain why E(std-kho) behaved much better than E(good-lpo): 
first, KBO turned out to be more suitable than LPO for these benchmarks; second, 
by not preferring initial clauses, the search plan of E(std-kho) did not consider the 
acyclicity and array axioms early in the search, a choice that turned out to be 
good. More remarkably, E(std-kho) did better than CVC Lite (Figure 8, right): 
its curve scales smoothly, while CVC Lite displays oscillating run times, showing 
worse performance for even instance sizes than for odd ones. 



^''This is a known bug, that will not be fixed since CVC is no longer supported [Stump 2005]. 
CVC Lite 1.1.0 does not support inductive types. 

ACM Transactions on Computational Logic, Vol. V, No. N, Month 20YY. 



40 



A. Armando and M. P. Bonacina and S. Ranise and S. Schuiz 



6.5 Experiments with QUEUE and CIRCULAR_QUEUE 
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Fig. 9. Performance of E, CVC and CVC Light on QUEUE (left) and CIRCULAR.QUEUE (right) 
with fc = 3. 



Similar to the IDS tests, CVC and CVC Light were expected to enjoy a great 
advantage over E on the QUEUE and CIRCULAR_QUEUE problems, because both CVC 
and CVC Light build all theories involved in these benchmarks, namely arrays, 
records and linear arithmetic, into their decision procedures. The diagram on the 
left of Figure 9 confirms this expectation, showing that CVC was the fastest system 
on the QUEUE(n) problems. However, E(good-lpo) was a good match for CVC Light, 
and E(std-kho) (not reported in the figure) also solved all tried instances. The 
plots on the right of Figure 9 refers to the experiments with CIRCULAR_QUEUE(n, fc), 
where fc = 3. It does not include CVC, because CVC cannot handle the m,odulo-k 
integer arithmetic required for circular queues. Between CVC Lite and E, the latter 
demonstrated a clear superiority: E(good-lpo) exhibited nearly linear performance, 
and proved the largest instance in less than 0.5 sec, nine times faster than CVC Lite. 
E(std-kho) behaved similarly. 

6.6 Experiments with "real-world" problems 

While synthetic benchmarks test scalability, "real-world" problems such as those 
from the UCLID suite [Lahiri and Seshia 2004] test performance on huge sets of 
literals. UCLID is a system that reduces all problems to propositional form without 
theory reasoning. Thus, in order to get problems relevant to our study, we used 
haRVey [Deharbe and Ranise 2003] to extract T-satisfiability problems from various 
UCLID inputs. This resulted in 55,108 proof tasks in the combination of the theory 
of integer offsets and the quantifier-free theory of equality, so that X-reduction was 
applied next. We ran E on all of them, using a cluster of 3 PC's with 2.4GHz 
Pentium-4 processors. All other parameters were the same as for the synthetic 
benchmarks. 
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Fig. 10. Distribution of run times for E in automatic mode (left) and using an optimized 
strategy (riglit) for tlie UCLID test set. 

These problems (all valid) turned out to be easy for E in automatic mode, where 
the prover chooses automatically ordering and search plan. It could solve all prob- 
lems, taking less than 4 sec on the hardest one, with average 0.372 sec and median 
0.25 sec. Figure 10 shows a histogram of run times: the vast majority of problems 
was solved in less than 1 sec and very few needed between 1.5 and 3 sec. An op- 
timized search plan was found by testing on a random sample of 500 problems, or 
less than 1% of the full set. With this search plan, very similar to E(std-kbo), the 
performance improved by about 40% (Figure 10, right): the average is 0.249 sec, 
the median 0.12 sec, the longest time 2.77 sec, and most problems were solved in 
less than 0.5 sec. 

7. DISCUSSION 

The application of automated reasoning to verification has long shown the im- 
portance of decision procedures for satisfiability in decidable theories. The most 
common approach to these procedures, popularized as the "little" proof engines 
paradigm [Shankar 2002], works by building each theory T into a dedicated infer- 
ence engine. For T-satisfiability procedures, that decide conjunctions of ground 
T-literals, the mainstay is the congruence closure algorithm, enhanced by building 
T into the algorithm (e.g., [Nelson and Oppen 1980; Stump et al. 2001; Lahiri 
and Musuvathi 2005b; 2005a; Dutertre and de Moura 2006; Nieuwenhuis and Oliv- 
eras 2007]). Such procedures are combined according to the scheme of [Nelson and 
Oppen 1979] or its variants (e.g., [Tinelli and Harandi 1996; Barrett et al. 2002b; 
Ganzinger 2002; Ghilardi 2004; Baader and Ghilardi 2005; Ranise et al. 2005]). 
Recent systematic treatments appeared in [Krstic and Conchon 2003; Manna and 
Zarba 2003; Conchon and Krstic 2003; Ranise et al. 2004; Ganzinger et al. 2004; 
Ghilardi et al. 2005]. 

For T-decision procedures, that decide arbitrary quantifier-free T-formulas, the 
so-called "eager" approaches seek efhcient reductions of the problems to SAT and 
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submit them to SAT solvers (e.g., [Jackson and Vaziri 2000; Bryant and Velev 2001; 
Bryant et al. 2002; Seshia et al. 2003; Meir and Strichman 2005]). The so-called 
"lazy" approaches (e.g., [de Moura et al. 2002; Barrett et al. 2002a; Flanagan et al. 
2003; Deharbe and Ranise 2003; Barrett and Berezin 2004; Ganzinger et al. 2004; 
Detlefs et al. 2005; Bozzano et al. 2005; Bozzano et al. 2005; Nieuwenhuis and 
Olivcras 2007]) integrate T-satisfiability procedures based on congruence closure 
with SAT solvers, usually based on the Davis-Putnam-Logemann-Loveland proce- 
dure (e.g., [Chang and Lee 1973; Moskewicz et al. 2001]). The resulting systems 
are called SMT solvers. 

By symmetry with little proof engines, we used "big" proof engines (e.g., [Stickcl 
2002]), for theorem-proving strategies for full first-order logic with equality, as im- 
plemented in state-of-the-art general-purpose theorem provers (e.g., [McCune 2003; 
Weidenbach et al. 1999; Riazanov and Voronkov 2002; Schuiz 2002]). There has 
always been a continuum between big and little engines of proof, as testified by the 
research on reasoning modulo a theory in big engines. The rewriting approach to 
T-satisfiability aims at a cross- fertilization where big engines work as little engines. 
The general idea is to explore how the technology of big engines (orderings, infer- 
ence rules, search plans, algorithms, data structures, implementation techniques) 
may be applied selectively and efficiently "in the small," that is, to decide specific 
theories. 

This exploration finds its historical roots in the relationship between congruence 
closure and Knuth-Bendix completion [Knuth and Bendix 1970] in the ground case: 
the application of ground completion to compute congruence closure was discovered 
as early as [Lankford 1975]; the usage of congruence closure to generate canonical 
rewrite systems from sets of ground equations was investigated further in [Gallier 
et al. 1993; Plaisted and Sattler-Klein 1996]; more recently, a comparison of ground 
completion and congruence closure algorithms was given in [Bachmair et al. 2003]; 
and ground completion and congruence closure were included in an abstract frame- 
work for canonical inference in [Bonacina and Dershowitz 2007] . The central com- 
ponent of the rewriting approach is the inference system SV, that is an offspring of a 
long series of studies on completion for first-order logic with equality. These systems 
were called by various authors rewrite-based, completion-based, superposition-based, 
paramodulation-based, contraction-based, saturation-based or ordering-based, to em- 
phasize one aspect or the other. Relevant surveys include [Plaisted 1993; Bonacina 
1999; Nieuwenhuis and Rubio 2001; Dershowitz and Plaisted 2001]. 

The gist of the rewriting approach to T-satisfiability is to show that a sound and 
refutationally complete "big" engine, such as SV, is guaranteed to generate finitely 
many clauses from T-satisfiability problems. By adding termination to soundness 
and completeness, one gets a decision procedure: an 57^-strategy that combines SV 
with a fair search plan is a T-satisfiability procedure. Depending on the theory, 
termination may require some problem transformation, termed T -reduction, which 
is fully mechanizable for all the theories we have studied. We emphasize that the 
inference system is not adapted to the theories. The only requirement is on the 
ordering: SV is parametric with respect to a CSO, and the termination proof for 
T may require that this ordering satisfies some property named T -goodness. For 
all considered theories, T-goodness is very simple and easily satisfied by common 
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orderings such as RPO's and KBO's. 

We proved termination of SV on several new theories, including one with infinite 
axiomatization^^. We gave a general modularity theorem for the combination of 
theories, and carried out an experimental evaluation, to test the pratical feasibility 
of the rewriting approach. The modularity theorem states sufficient conditions 
(no shared function symbols, variable- inactive theories) for SV to terminate on 
a combination of theories if it terminates on each theory separately. The "no 
shared function symbols" hypothesis is common for combination results. Variable- 
inactivity is satisfied by all equational theories with no trivial models. First-order 
theories that fail to be variable-inactive in an intuitive way are not stably infinite, 
and therefore cannot be combined by the Nelson-Oppen scheme either. On the other 
hand, it follows from work in [Bonacina et al. 2006] that variable-inactive theories 
are stably infinite. The quantifier-free theories of equality, lists, arrays with or 
without extensionality, records with or without extensionality, integer offsets and 
integer offsets modulo, all satisfy these requirements, so that any fair iST^-strategy 
is a satisfiability procedure for any of their combinations. The theories of arrays 
and possibly empty lists are not convex, and therefore cannot be combined by the 
Nelson-Oppen scheme without case analysis. 

A different approach to big engines in the context of theory reasoning, and espe- 
cially combination of theories, appeared in [Ganzinger et al. 2006]: the combination 
or extension of theories is conceived as mixing total and partial functions, and a 
new inference system with partial superposition is introduced to handle them. Be- 
cause the underlying notion of validity is modified to accomodate partial functions, 
the emphasis of [Ganzinger et al. 2006] is on defining the new inference system and 
proving its completeness. The realization of such an approach requires to imple- 
ment the new inference system. The essence of our methodology, on the other hand, 
is to leave the inference system and its completeness proof unchanged, and prove 
termination to get decision procedures. This allows us to take existing theorem 
provers "off the shelf." 

For the experimental evaluation, we designed six sets of synthetic benchmarks on 
the theory of arrays with extensionality and combinations of the theories of arrays, 
records and integer offsets or integer offsets modulo. For "real-world problems," 
we considered satisfiability benchmarks extracted from the UCLID suite. Our ex- 
perimental comparison between the iST'-based E prover and the validity checkers 
CVC and CVC Lite is a first of its kind, and offers many elements for reflection 
and suggestions for future research. 

The analysis of the traces of the theorem prover showed that these satisfiabil- 
ity problems behave very differently compared to more typical theorem-proving 
problems. Classical proof tasks involve a fairly large set of axioms, a rich signature, 
many universally quantified variables, many unit clauses usable as rewrite rules and 
many mixed positive/negative literal clauses. The search space is typically infinite, 
and only a very small part of it gets explored. In T-satisfiability problems, input 
presentations are usually very small and there is a large number of ground rewrite 
rules generated by flattening. The search space is finite, but nearly all of it has 



^^Our termination result for the theory of integer offsets was generaUzed in [Bonacina and Echenirn 
2007b] to the theories of recursive data structures as defined in [Oppen 1980]. 
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to be explored, before unsatisfiability (validity) can be shown. Table I compares 
the behavior of E on some medium-difficulty unsatisfiable array problems and some 
representative TPTP problems of similar difficulty for the prover. 

Problem Initial Generated Processed Remaining Unnecessary 

Name clauses clauses clauses clauses inferences 



ST0REC0MM(60)/1 


1896 


2840 


4323 


7 


26.4% 


ST0REINV(5) 


27 


22590 


7480 


31 


95.5% 


SWAP(8)/3 


62 


73069 


21743 


56 


98.2% 


SET015-4 


15 


39847 


7504 


16219 


99.90% 


FLD032-1 


31 


44192 


3964 


31642 


99.96% 


RNG004-1 


20 


50551 


4098 


26451 


99.90% 



The data are for E in automatic mode. ST0REC0MM(60)/1 is one of the problems in 
STDRECDMM(n) for n = 60, ST0REINV(5) is STOREINV(n) for n = 5 and SWAP(8)/3 is one of 

the problems in SWAP(n) for n = 8. The others are representative problems from 

TPTP 3.0.0. The sum of processed and remaining clauses may be smaller than the sum 

of initial and generated clauses, because E removes newly generated trivial clauses, as 

well as unprocessed clauses whose parents become redundant, without counting them as 

processed. The final column shows the percentage of all inferences (expansion and 

contraction) that did not contribute to the final proof. 

Table I. Performance characteristics of array and TPTP problems. 

Most search plans and features of first-order provers were designed assuming 
the search space characteristics of typical first-order problems. Thus, the theorem 
prover turned out to be competitive with the little-engines systems, although it 
was optimized for different search problems. This means that not only is using a 
theorem prover already a viable option in practice, but there is a clear potential to 
improve both performance and usability, by studying implementation techniques of 
first-order inferences that target T-satisfiability, by designing theory-specific search 
plans, and by equipping the prover with the ability to recognize which theories ap- 
pear in the input set. The prover also terminated in many cases beyond the known 
termination results (cf. Figure 6 and the runs with E(std-kbo), whose ordering is 
not ^-good). Thus, theorem provers are not as brittle as one may fear with re- 
spect to termination, and still offer the flexibility of adding useful lemmata to the 
presentation, as shown in Section 6.2. 

The above remarks suggest that stronger termination results may be sought. The 
complexity of the rewrite-based procedures may be improved by adopting theory- 
specific search plans. Methods to extract models from saturated sets can be inves- 
tigated to complement proof finding with model generation, which is important for 
applications. For instance, in verification, a model represents a counter-example 
to a conjecture of correctness of a system. The ability to generate models marks 
the difference between being able to tell that there are errors (by reporting "sat- 
isfiable"), and being able to give some information on the errors (by reporting 
"satisfiable" and a model). Since we do not expect the rewrite-based approach to 
work for full linear arithmetic, another quest is how to integrate it with methods 
for arithmetic [RueB and Shankar 2004] or other theories such as hit-vectors [Cyrluk 
et al. 1996]. Research on integration with the latter theory began in [Kirchner et al. 
2005]. 
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Most verification problems involve arbitrary quantifier-free T-formulae, or, equiv- 
alently, sets of ground T-clauses. Thus, a major open issue is how to apply big 
engines towards solving general T-decision problems. Since SV is an inference sys- 
tem for general first-order clauses, a set of ground T-clauses may be submitted 
to an iSP-based prover. Showing that an iSP-strategy is a T-decision procedure 
requires extending the termination results from sets of ground literals to sets of 
ground clauses. Sufficient conditions for termination of SV on T-decision problems 
were given recently in [Bonacina and Echenim 2007a]. However, verification prob- 
lems of practical interest typically yield large sets with huge non-unit clauses, and 
first-order provers are not designed to deal with large disjunctions as efficiently as 
SAT solvers. 

In practice, a more plausible approach could be to follow a simple "lazy" scheme 
(e.g., [Barrett et al. 2002a; Flanagan ct al. 2003]), and integrate a rewrite-based T- 
satisfiability procedure with a SAT solver that generates assignments. The rewrite- 
based T-solver produces a proof, whenever it detects unsatisfiability, and can be 
made incremental to interact with the SAT solver according to the "lazy" scheme. 
However, the state of the art in SMT solvers indicates that a tight integration of 
the two solvers is required to achieve high performances. SAT solvers are based on 
case analysis by backtracking, whereas rewrite-based inference engines are proof- 
confluent, which means they need no backtracking. While proof confluence is an 
advantage in first-order theorem proving, this dissimilarity means that a tight in- 
tegration of SAT solver and rewrite-based T-solver requires to address the issues 
posed by the interplay of two very different kinds of control. 

In current work we are taking a diffrent route: we are exploring ways to decom- 
pose T-decision problems, in such a way that the big engine acts as a pre-processor 
for an SMT solver, doing as much theory reasoning as possible in the pre-processing 
phase. In this way we hope to combine the strength of a prover, such as E, in equa- 
tional reasoning with that of an SMT solver in case analysis. Even more general 
problems require to reason with universally quantified variables, that SMT solvers 
handle only by heuristics, following the historical lead of [Detlefs et al. 2005]. In 
summary, big engines arc strong at reasoning with equalities, universally quantified 
variables and Horn clauses. Little engines are strong at reasoning with preposi- 
tional logic, non-Horn clauses and arithmetic. The reasoning environments of the 
future will have to harmonize their forces. 
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